Evaluation of Dynamically Downscaled CMIP6‐CCAM Models Over Australia

High‐resolution climate change projections are increasingly necessary to inform climate policy and adaptation planning. Downscaling of global climate models (GCMs) is required to simulate the climate at the spatial scale relevant for local impacts. Here, we dynamically downscaled 15 CMIP6 GCMs to a 10 km resolution over Australia using the Conformal Cubic Atmospheric model (CCAM), creating the largest ensemble of high‐resolution downscaled CMIP6 projections for Australia. We compared the host CMIP6 models and downscaled simulations to the Australian Gridded Climate Data (AGCD) observational data and evaluated performance using the Kling‐Gupta efficiency and Perkins skill score. Downscaling improved performance over host GCMs for seasonal temperature and precipitation (10% and 43% respectively), and for annual cycles of temperature and precipitation (6% and 13% respectively). Downscaling also improved the fraction of dry days, reducing the bias for too many low‐rain days. The largest improvements were found in climate extremes, with enhancements to extreme minimum temperatures in all seasons varying from 142% to 201%, and to extreme precipitation of 52% in Austral winter and 47% in summer. The ensemble average integrated skill score improved by 16%. Temperature and precipitation biases were reduced in mountainous and coastal areas. CCAM downscaling outperformed host CMIP6 GCMs at multiple spatial scales and regions—continental Australia, Australian IPCC regions and Queensland's regions—with integrated added value ranging from 9% to 150% and higher over densely populated regions more exposed to climate impacts. This data set will be a valuable resource for understanding future climate changes in Australia.

Following CORDEX guidance, regions and countries have started producing their own high-resolution climate simulations.The first activities of the CORDEX initiative produced ensembles of projections over regions covering all continents (Gutowski et al., 2016) at a spatial resolution of 50 km or higher in some regions.Examples include the European domain of the CORDEX initiative (Jacob et al., 2014(Jacob et al., , 2020;;Kotlarski et al., 2014), the North American domain (Bukovsky & Mearns, 2020;Komurcu et al., 2018;McGinnis & Mearns, 2021;Mearns et al., 2012), and Australasian domain (Evans et al., 2021).More recently, CORDEX-CORE (Coordinated Output for Regional Evaluations; Giorgi, Coppola, Jacob, et al., 2021;Giorgi, Coppola, Teichmann, & Jacob, 2021;Gutowski et al., 2016) was initiated to provide a homogenous set of projections for all CORDEX domains using a core set of Regional Climate Models (RCMs) driven by a common set of GCMs from CMIP5 simulations at 25 km spatial resolution.In 2021 the CORDEX experimental design for dynamical downscaled CMIP6 models was published (WCRP CORDEX, 2021), and many modeling groups are performing simulations to contribute to the global archive of regional projections for decision making and adaptation.In addition, high-resolution downscaled projections have been produced for individual countries such as UKCP09 and UKCP18 in the UK (Jenkins et al., 2010;Lowe et al., 2018), and CH2011 and CH2018 in Switzerland (CH2011, 2011;Fischer et al., 2022;Sørland et al., 2020).In Australia, several regional projections at a 10 km spatial resolution or higher were produced by States, such as Tasmania (Corney et al., 2010), New South Wales (Evans et al., 2014;Nishant et al., 2021), Victoria (Clarke et al., 2019) and Queensland (Syktus, Toombs, et al., 2020;Syktus, Trancoso, et al., 2020).
Australia is particularly vulnerable to climate change due to its highly variable past and present-day climate, such as fluctuations of wet and dry periods and hydroclimatic extremes (Harris & Lucas, 2019;King et al., 2020).The variability of climate extremes is expected to increase as the climate continues to warm (Australian Academy of Science, 2021).Over the past decade, unprecedented natural hazards were attributed to the changing climate (Jan Van Oldenborgh et al., 2021;Lewis & Karoly, 2013), denoting the need to strengthen the guidelines to build preparedness for natural disasters.The Australian Government recognizes that the underlying data underpinning the assessment of natural disaster risks needs to go beyond historical observations to include downscaled climate projections and explicitly recommends that States and jurisdictions tackle this need (Commonwealth of Australia, 2020).Downscaled climate projections are vital for informing adaptation policies (Di Virgilio et al., 2019;Fischer et al., 2022) and climate risk assessments (Trancoso et al., 2020).However, for climate projections to be useful for local policy and adaptation, they need to be at a scale relevant for local impacts (Komurcu et al., 2018).
The Queensland government has previously produced high-resolution projections based on the CMIP5 models, QldFCP-1 (Syktus, Toombs, et al., 2020;Syktus, Trancoso, et al., 2020).These projections underpin the implementation of Queensland's Climate Adaptation Strategy and are part of the action plan to improve knowledge of climate change impacts in Queensland and underlie sector adaptation planning (Department of Environment and Heritage Protection, 2017).Queensland's downscaled projections have been made available online via the Queensland Future Climate Dashboard (https://www.longpaddock.qld.gov.au/qld-future-climate/dashboard/) and used by Queensland Fire and Emergency Services for assessing future heatwave (Chesnais et al., 2019;Trancoso et al., 2020) and severe wind risks in Queensland (Arthur et al., 2021), and in scientific assessments of drought (Spinoni et al., 2021), water quality (Eccles et al., 2023), flood risk (Chiew et al., 2022;Eccles et al., 2021), and rainfall over topography (Grose et al., 2019).However, there is a need to update these data sets to the latest CMIP6 models, and to provide consistent high-resolution projections for the country because different modeling groups are producing downscaled climate simulations using the updated CORDEX experimental protocol.To this end, the National Partnership for Climate Projections has been created to guide the delivery of a nationally coordinated approach to climate projections in Australia (Department of Climate Change, Energy, the Environment and Water, 2022;Grose et al., 2023).These new projections are especially important in the context of recent policy developments in response to climate change, including the National Climate Resilience and Adaptation Strategy, reforms to the Safeguard Mechanism, and the planned National Climate Risk Assessment.
The spatial resolution of GCMs used in CMIP5 and CMIP6 projections is still too coarse (∼50-250 km) for local and regional applications.Creating high-resolution (10-25 km) climate projections involves downscaling global climate projections, either statistically or dynamically.High-resolution models are generally more skillful in simulating extremes, such as heavy precipitation, strong winds, and severe storms (Gutowski et al., 2020).Dynamical downscaling is accomplished by running a RCM with lateral and initial boundary conditions from a GCM.While dynamic downscaling is computationally expensive, it may capture changes in extreme events 10.1029/2023EF003548 3 of 20 in a more physically consistent way than a statistical model developed using the present-day climate (Komurcu et al., 2018).Previous studies into dynamical downscaling have shown it can add value over GCMs, particularly in areas with complex topography, and for extreme events (Giorgi, 2019;Grose et al., 2019).High-resolution simulations are generally not expected to correct large-scale errors from the GCMs, but to add realistic granularity with sub-GCM grid detail (Giorgi, 2019).These details can improve land-sea contrasts, rough terrain and a range of climate processes associated with these landscape features, such as orographic precipitation (Reder et al., 2020).This, however, depends on the domain size, and for large domains, RCMs may correct some large-scale GCM errors (Diaconescu & Laprise, 2013).While CMIP5-based downscaled simulations are still a valuable resource for analyzing future climate in Australia, updated downscaled simulations for CMIP6 are required as the GCMs, emission scenarios and underlying science have been updated.
In light of this, we have downscaled CMIP6 simulations using Conformal Cubic Atmospheric model (CCAM), to a 10 km resolution over the Australian continent using the CORDEX experimental protocol, creating the QldFCP-2 data set (Queensland Future Climate Projections 2), which will be available via CORDEX.However, prior to using these simulations for impact assessments and to support adaptation policies, we need to evaluate their performance in the present-day, and determine whether they have improved upon the representation of the observed climate in the CMIP6 host GCMs.
The objectives of this research are threefold: 1. to evaluate the performance of the downscaled models and host models and determine the added value of high-resolution climate simulations; 2. to investigate improvements of downscaled climate simulations across different time frequencies (daily, monthly and seasonal); and 3. to reveal insights into the regional variation of the added value from high-resolution climate simulations across different climatic regions.
We do this by using the CCAM to dynamically downscale 15 CMIP6 GCMs over Australia at a 10 km resolution, for the 1960-2100 period for three Shared Socioeconomic Pathways (SSPs: 126, 245, and 370).We then evaluate the performance of the downscaled models and the host models (GCMs) for the historical period , by comparing (minimum, mean and maximum) temperatures and precipitation with the Australian Gridded Climate Data (AGCD) observational data set across daily, monthly, and seasonal timescales.The assessment is done for the whole of Australia, the state of Queensland and within 12 different climatic regions within Queensland.This data set is the largest ensemble, and highest resolution, of downscaled CMIP6 climate projections for the Australian continent and surrounds currently.

Study Area
The performance of the CCAM variable resolution global model was evaluated for Australia (Figure 1).Given the large size of the Australian continent, the climate varies considerably, including equatorial, tropical, sub-tropical, temperate, mediterranean and arid regions.The interior of the continent is primarily arid, and most of the human population is concentrated along the coasts.Mountainous regions are found along the eastern coast of Australia, in the Great Dividing Range and Australian Alps.The 12 regional planning areas (RPA) of Queensland (QLD) were also used for model assessment, as they are large enough to be suitable for assessment with a 10-km model, while also allowing us to capture different climate zones in QLD.The RPA boundaries are available at the Queensland Spatial Catalog (Q-Spatial; https://qldspatial.information.qld.gov.au/catalogue/custom/detail.page?fid={322020F6-9395-44C7-9841-E998ADF6B76E}).

Experimental Setup
We used the Conformal Cubic Atmospheric Model (CCAM; Thatcher, 2020) developed by CSIRO (McGregor, 2005;McGregor & Dix, 2008), to dynamically downscale 15 CMIP6 GCMs.Typically, GCMs are downscaled by running the RCM over a limited domain of interest (Giorgi, 2019), however, CCAM is a global stretched grid model, and so runs for the entire globe, while the domain of interest can be at a higher resolution (McGregor, 2015;Syktus & McAlpine, 2016;Trancoso et al., 2022).In comparison to limited-domain RCMs, a stretched grid model provides self-consistent interactions between global and regional scales (Fox-Rabinovitz et al., 2006).Rather than providing lateral boundaries, the regional atmosphere is influenced by large-scale changes in the climate predicted by the host GCM, while the atmosphere at small scales is allowed to evolve freely (Thatcher & McGregor, 2009).This means that CCAM will follow the host GCM for temperatures and winds at large spatial scales (e.g., 3,000 km or roughly the size of Australia), 1.5 km above the surface, as well as for surface pressure.Using this approach, CCAM can be used in a similar fashion to other RCMs, that is, by using high frequency (3-6 hr) large scale atmospheric variables from the host CMIP6 model, applied by spectral nudging to force CCAM.Two limitations of this approach are that there is only a limited number of CMIP6 experiments with high-frequency output for dynamical downscaling, and the GCM's systematic biases can influence the downscaled climate.Alternatively, CCAM can also be used in downscaling via the AMIP (Atmospheric Model Intercomparison Project) approach (Gates et al., 1999), where the model is constrained by realistic (i.e., bias corrected) sea surface temperatures (SSTs) and sea ice from the host CMIP6 GCMs and time evolving CMIP6 radiative forcings (Eyring et al., 2016).
SST changes are one of the most important sources of uncertainty in multi-model climate projections (Kent et al., 2015) and all CMIP models exhibit significant regional biases, which are especially pronounced in tropical Indo-Pacific, Southern Ocean and eastern boundary current regions (Ashfaq et al., 2011).The distribution of seasonal tropical precipitation in Australian region is highly sensitive to local SSTs (Good et al., 2020;Watterson, 2020) and regional climate change depends not only on the magnitude of global warming, but also on the spatial patterns of warming (J.N. Brown et al., 2015;Huang & Ying, 2015).The climate of northern Australia is dominated by a highly seasonal distribution of precipitation with wet summers and dry winters (Klingaman et al., 2013).Climate change projections for northern Australian precipitation are highly uncertain, with no clear indication of even the direction of change (J.R. Brown et al., 2016, see also Box TS.g Figure 1a in IPCC ( 2021)) because of model biases (Catto et al., 2012;Richter, 2015).Therefore, to alleviate common issues with model bias in simulated SSTs we adopted the methodology developed by Hoffmann et al. (2016), where the host CMIP6 model SST and sea ice are bias corrected.Bias-correcting SSTs prior to downscaling has been found to improve the simulation of climate in CCAM and for limited-domain RCMs (Hernández-Díaz et al., 2017;Hoffmann et al., 2016;Kim et al., 2020;Lim et al., 2019;Takhsha et al., 2018).
CCAM was run using a stretched C288 grid, with approximately 10 km resolution over Australia (See Figure 1b).CCAM was run in both atmospheric and coupled atmosphere-ocean versions (see Table 1).A total of 35 vertical levels in the atmosphere, and 30 in the ocean for the ocean-coupled models, were used (Thatcher et al., 2015).For the atmosphere ocean-coupled models, spectral nudging was applied to the SSTs from the host model (Thatcher & McGregor, 2009).CCAM was forced using the CMIP6 radiative forcings for the historical and scenarios, which include solar, natural and anthropogenic aerosols, transient land use, ozone, and greenhouse gases, as described in Eyring et al. (2016), and by bias-corrected SSTs and sea-ice from the host CMIP6 GCMs.The bias-correction for the SSTs used in CCAM is described in Hoffmann et al. (2016).
Our downscaling method reduces the model spread and removes some systematic biases present in the host CMIP6 models.This, however, does not change the trends in temperature and patterns of change over time (Figure S1 in Supporting Information S1), and the global temperature anomaly is very similar in both the CMIP6 host models and the downscaled CMIP6-CCAM models.
The high-resolution downscaling was completed for the 1960-2014 historical period for 15 CMIP6 models runs from 12 different GCMs (Table 1) in total, including five simulations with CCAM coupled atmosphere-ocean model.The ensemble size was dictated by the desire to estimate the probabilistic distribution of climate change on regional scales (Xie et al., 2015) and to mimic as much as practical the spread of the ensemble from global CMIP6 models while also selecting models that can represent the current climate of Australia well.For example, we selected two outlier models projecting the largest rainfall change in the Australian region, the driest (ACCESS-ESM1.5)and wettest (EC-Earth3) as well as several GCMs spread across the distribution of projected temperature and rainfall changes (Figure 2).Models were evaluated for the ability to represent Australia's climate from 1995 to 2014 (represented by the AGCD observational data set) using the Kling-Gupta Efficiency (KGE) score applied to precipitation and surface air temperature considering seasonal and monthly frequencies.The climate change signal was evaluated for precipitation and temperature at mid and end of century.The climate change signal and the KGE skill score for historical simulations were used to select the best performing ensemble runs from different GCMs with representative climate change signal using the Skill-Spread-Selection algorithm (see Trancoso et al. (2023) for further details).To account for model independence, the Skill-Spread-Selection algorithm selects only one model from each modeling group (Figure 2).Our approach is consistent with recent study on GCM selection accounting for performance, spread and independence (Merrifield et al., 2023).In addition, the model selection was guided by previous studies on model independence (Bishop & Abramowitz, 2013;Brunner et al., 2020;Evans et al., 2014) and our recent analysis detailing the design of multimodal downscaling ensemble underpinning climate change services in Australia (Grose et al., 2023).The final ensemble size was expanded to include two pairs of coupled and atmospheric only simulations and two additional representative (wettest and driest) ensemble members from the large ensemble (40 members) of ACCESS-ESM1.5 simulations.

Model Evaluation
We evaluated the performance of the GCMs and downscaled models by comparing their outputs with the Australian Gridded Climate Data (AGCD; D. A. Jones et al., 2009), previously known as the Australian Water Availability Project.AGCD provides daily gridded temperature and precipitation values on a 0.5° resolution grid.AGCD data are interpolated by applying a statistical model to station data.More weather stations are available along the coast, and there are gaps in central Australia, therefore, the data quality is poor in these areas.We chose not to mask out these areas from the analysis as masking made no difference to our conclusions.
We evaluated minimum, maximum and average temperature and precipitation.Prior to evaluation, AGCD and CMIP6 host model data was regridded to 10 km grid (CCAM downscaling grid) using distance weighted regridding for precipitation, and bilinear interpolation for temperature.The climatological average from 1981 to 2010 was used to evaluate the model performance.
We evaluated model performance and the added value of downscaling using the Kling-Gupta efficiency (KGE; Gupta et al., 2009) and Perkins skill score (Perkins et al., 2007).The KGE combines the three components of Nash-Sutcliffe efficiency, correlation, bias and variability, into one metric, which is then used to rank models based on seasonal precipitation and temperature, and annual cycles of precipitation and temperature (Equation 1).Note that for calculating the ensemble mean of the subset, to ensure that ACCESS-ESM1-5 was not weighted more than the other models, the mean of the three ACCESS-ESM1-5 variants was calculated, and this value was used in the calculation of the subset ensemble mean.
Where r is the correlation component (Pearson's correlation coefficient), β measures bias, and is the ratio of estimated and observed means, and γ measures variability, and is the ratio of estimated and observed coefficients of variation.A KGE value of 1 is the maximum skill given by a perfect match with observations.
For the seasonal assessment, KGE was calculated for the four calendar seasons and at the annual scale, followed by an overall seasonal score obtained by averaging the five seasonal KGE values.The ability of models to reproduce observed annual cycles was evaluated using monthly data.Two metrics were calculated to represent the overall bias and the bias in the annual cycle amplitude.The first is given by the summation of absolute bias over the 12 months, and the second is given by the average absolute bias of the coldest and warmest months.The integrated KGE for the annual cycle is given by the average of the overall bias and the bias in the amplitude of the annual cycle.
In addition, we used the modified Perkins skill score to evaluate model performance at daily time steps (Perkins et al., 2007).The Perkins skill score evaluates models based on similarity between the modeled and observed probability density functions (PDFs).The binning of data to construct histograms was based on the distribution of the observed data.The Perkins skill score is then calculated as follows: where n is the number of bins used to calculate the histogram, Z m , is the frequency of values in a given bin for the model, and Z o is the frequency of values in a given bin for the observations.Temperature was binned at 0.5°C.Precipitation was binned as 0, 0.1, and then from 0.1 to 20.1 at 0.5 mm/day, and then at 10 up to 200.1 mm/day.If a model simulates the observed PDF poorly, the Perkins skill score will be close to zero, while if the PDF is well simulated the score will approach a maximum of 1.The Perkins skill score was used to assess the entire distribution of the climate variables as well as the lower and upper tails of the distributions (0-5th and 95-100th percentiles) to represent the ability of models to capture temperature extremes.For precipitation, we have used the fraction of dry days instead of the 0-5th percentiles to represent the lower tail.Note that for evaluating the upper and lower tails, the scores were normalized to be within the range of 0-1, as for the Perkins score for the entire distribution.

Added Value of Downscaling
Where the downscaled CCAM runs improve on KGE or Perkins score over the host GCM, dynamically downscaling the GCMs with CCAM has added value.
An integrated skill score aiming to representing the overall performance of models representing historical spatial patterns, annual cycle and climate extremes was also calculated, using Equation 3: Where ISS is the integrated skill score, KGE pr is the KGE score for precipitation data, KGE temp is the average of the KGE of average, maximum and minimum temperature.KGE ACyc pr is the KGE for the annual cycle of precipitation, and KGE ACyc temp is the average of the KGE for the annual cycle of average, maximum and minimum temperature.  * pr is the average Perkins skill score for precipitation for the entire distribution and the 95th percentile, and the score for the fraction of dry days.
* temp is the average Perkins skill score for average, maximum and minimum temperature for the entire distribution and the 5th and 95th percentiles.
Model performance was assessed for the entirety of Australia, and for the 12 QLD regional plan areas.
There are several different ways of calculating added value in the literature (Di Luca et al., 2015, 2016;Lloyd et al., 2020).Our approach is to examine multiple statistics and variables that are of most interest to impacts in the region.Therefore, we have focussed on temperature and precipitation at daily and seasonal timescales, and the annual cycle, for mean as well as extreme values.This builds on previous approaches, by using multiple metrics and timescales, rather than focusing on just one timescale (Ciarlo et al., 2021).However, in this paper we do not consider the climate change signal in the calculation of added value (di Virgilio et al., 2020), and instead focus on refining how well the downscaled models add value to the simulation of the climate in the historical period.
We propose a more comprehensive way to represent the strength of regional modeling by explicitly accounting for extremes, annual cycle and spatial patterns of four climate variables.This is done by calculating the integrated added value, which is the net difference between the integrated skill scores of the downscaled model and host model (ISS CCAM − ISS host GCM ), denoting a score increment in representing historical climate attributable to downscaling.Similarly, the % integrated added value given by (ISS CCAM − ISS host GCM )/ISS host GCM represents the performance gain relative to the baseline score (host model) after downscaling.This new approach combining multiple performance statistics, variables and time-steps was designed to better target added value, by factoring in the performance of models simulating long-term spatial patterns, annual cycle and extreme temperature and precipitation.The approach enables to target specific regions, to identify strengths and weakness of models and to compare performance increments across factors, ensembles, variables and regions.

Evaluation of CMIP6 and CMIP6-CCAM: Bias and RMSE
We evaluated the performance of the CMIP6 host models, and CMIP6 driven CCAM (hereafter CMIP6-CCAM), by comparing temperature and precipitation with AGCD. Figure 3 shows the ensemble mean of the bias in the CMIP6 and CMIP6-CCAM ensembles (see Figures S2-S25 in Supporting Information S1 for individual model bias).Downscaling with CCAM reduced bias relative to the host GCMs in annual averages of all variables.Temperatures were also improved by downscaling for Australia summer DJF (December-February) and Austral winter JJA (June-August), whereas the ensemble mean bias for precipitation was slightly worse in DJF and JJA after downscaling.The ensemble mean bias in daily mean temperatures in CMIP6-CCAM is less than in CMIP6 (−0.02 and 0.54°C respectively).The largest improvement in performance for CMIP6-CCAM over CMIP6 is in minimum temperatures (0.44 and 2.24°C respectively, calculated as an area average over the Australian continent).CMIP6-CCAM also improves upon CMIP6 for maximum temperature biases (0.37 and −0.99°C respectively), however CMIP6-CCAM tends to be too warm in Western Australia, and too cool in Northern Australia, whereas CMIP6 is too cool across most of the continent (Figure 6).Annual precipitation bias is improved by downscaling (bias of -4.38 mm/month vs. 7.39 mm/month).
In general, the bias in individual CMIP6-CCAM models was very similar, while there was more variation in the CMIP6 ensemble.For annual precipitation, individual CMIP6 models varied between too wet and too dry, whereas the CMIP6-CCAM models tended to be too dry over large parts of the continent (Figures S2-S7 in Supporting Information S1).In DJF (wet season), CMIP6-CCAM models were too dry in northern Australia and had a wet bias along the eastern coast.Many of the CMIP6 models were too dry along the northern coast, and too wet in inland areas.The CMCC-ESM2 model had a large wet bias over most of the continent, which, when downscaled, was mostly removed, though it became too dry along the northern coast, as per the other CCAM-CMIP6 models.For most individual models, bias was reduced in DJF precipitation by downscaling, except for ACCESS-CM2, GISS-E2-1-G and MPI-ESM2-LR.In general, the atmosphere-only models performed better for DJF precipitation bias than the ocean-coupled models (average bias of −12.37 mm/month for ocean coupled vs. −7.73mm/day for uncoupled).The ocean-coupled NorESM2-MM version outperformed the uncoupled version, while the uncoupled version of CNRM-CM6-1-HR outperformed the coupled version.There was less improvement in individual model bias for JJA precipitation, as bias was already low in most models.Individual model bias was improved with downscaling in six of the examined models in JJA, and the improvement was larger or similar in the other models.In comparison to DJF, the ocean-coupled models generally performed better than the atmosphere-only models (average bias of 1.72 mm/month for ocean coupled vs. −4.72 mm/day for uncoupled).Downscaling also tended to reduce biases associated with topography, particularly in the Australian Alps in south-eastern Australia.Biases along the coast for annual and DJF precipitation were also reduced.
Downscaling improved the individual model bias in average annual temperature for all models, except for NorESM2-MM.The ocean-coupled version had the same bias as the host model, while the uncoupled version was slightly worse.For DJF, downscaling improved bias in all models except CNRM-CM6-1-HR, FGOALS-g3, GISS-E2-1-G, and NorESM2-MM.Bias calculated across the continent was similar in EC-Earth3 and GFDL-ESM4 after downscaling, however prior to downscaling EC-Earth3 had large negative and positive biases, which canceled each other out and were reduced after downscaling.Like for precipitation, the ocean-coupled NorESM1-MM outperformed the uncoupled version, while the CNRM-CM6-1-HR ocean-coupled version performed worse than the uncoupled version.The CMIP6-CCAM models tend to be too cool in eastern and northern Australia in DJF, and too warm in western and southern Australia.Downscaling also reduced biases associated with topography in the Australian Alps and Great Dividing Range.Biases along the coast were also reduced.For the ocean coupled models, they tended to have lower bias than the uncoupled models in DJF (mean bias of −0.16°C for coupled version vs. −0.34°Cfor the uncoupled version), but higher biases in JJA (mean bias of 0.24°C for coupled version vs. −0.02°Cfor uncoupled version).
For annual minimum temperature, the individual model bias was improved except for the ocean-coupled version of CNRM-CM6-1HR.The uncoupled version outperformed the host model.For DJF and JJA, all models were improved by downscaling except CNRM-CM6-1-HR.Like for average temperature, the uncoupled version of CNRM-CM6-1-HR outperformed the coupled version, while the opposite occurred for NorESM1-MM.The ocean coupled versions generally performed worse in both DJF and JJA than the uncoupled versions (mean bias of −0.31°C for coupled version and −0.15°C for uncoupled in DJF, and 1.41 and 0.99°C respectively for JJA).
For annual maximum temperature, the majority of CMIP6 models were too cold.Downscaling improved bias in all models except EC-Earth3 and CNRM-CM6-1-HR.For DJF (summer), individual model bias improved in all models except the ACCESS-CM2 model, and ACCESS-ESM1-5 variants r20 and r40, however many of the CMIP6 models were too cool across most of the continent, while the CCAM-CMIP6 models had hot and cold biases, which canceled each other out when calculating the country-wide average.The ocean coupled versions of NorESM1-MM and CNRM-CM6-1-HR performed better than the uncoupled versions.The uncoupled versions generally had lower mean biases than the coupled versions in DJF for maximum temperature (mean bias of 0.62°C for coupled version and 0.16°C for uncoupled), however, this was partly due to cancellation of biases in the uncoupled models, which tended to have more negative biases in north-eastern Australian than the coupled versions.For maximum temperature in JJA (winter), downscaling improved the bias in all models.
The ocean-coupled NorESM1-MM outperformed the uncoupled version, while the CNRM-CM6-1-HR ocean coupled version performed worse than the uncoupled version.The uncoupled models in general had lower mean biases than the coupled models (mean bias of 0.15°C for coupled version and 0.08°C for uncoupled).For maximum and minimum temperature, downscaling reduced bias along the coast, and those associated with topography in the Australian Alps and Great Dividing Range, along the eastern coast of Australia.However, for minimum temperature, some biases remained in DJF in the Australian Alps.
We also examined the correlation and the root mean square error (RMSE) for the CMIP6 and CMIP6-CCAM ensembles for Australia (Figure 4).There is less spread within the CMIP6-CCAM ensemble, than in the host GCMs.Downscaling with CCAM improved the RMSE for all variables.Correlation also improved for all variables except maximum temperature.

Perkins and KGE Skill Score
To evaluate the impact of downscaling on daily values, we also calculated the Perkins score (see Figure 5 for temperature variables, and Figure 6 for precipitation).For average temperature, there is little improvement in the Perkins score with downscaling, as the score for CMIP6 models is already generally fairly high (>0.9),and the ensemble average declines slightly by 0.01 (0.09%).For maximum and minimum temperatures, downscaling improves the Perkins score for most models, however, the scores are again, generally fairly high.The ensemble average improved by 0.02 (2%) and 0.08 (8%) respectively, annually.For minimum temperature, the Perkins score also improved for DJF and JJA, with the ensemble average improving by 0.14 (18%) and 0.10 (14%), respectively.For maximum temperature, although the ensemble average declined by 0.003 (0.4%) in DJF, it improved by 0.06 (7%) in JJA.The biggest differences occur when looking at extremes.For the maximum temperature, downscaling improves the 95th percentile Perkins score for all models in JJA, with the ensemble average improving by 0.39 (87%).However, there is a decline in the score in DJF, where the ensemble average score declined by 0.14 (22%).The annual score also declined by 0.06 (9%).For minimum temperature, there is an improvement in the Perkins score for the lower tail of the distribution (fifth percentile) in all seasons.In DJF, the ensemble average improved by 0.59 (165%), and in JJA the scores improved by 0.33 (201%), although the actual scores are higher in DJF than in JJA.Annually, the Perkins score improve by 0.41 (142%).The ocean-coupled models performed slightly worse for extreme minimum temperatures than the uncoupled models, and similarly compared to the uncoupled models for extreme maximum temperatures.
For precipitation, CCAM improves upon the fraction of dry days for most models and seasons.The improvement of the ensemble average score in the fraction of dry days is 0.1 (46%) in DJF and 0.08 (46%) in JJA.All models (downscaled and host models) underestimate the number of dry days, however, this bias is significantly reduced with downscaling (Figure 6).The Perkins score across the entire distribution for precipitation is similar for CMIP6-CCAM and CMIP6, and the ensemble average score only improves by 0.012 (2%).For the majority of models, however, downscaling improves the Perkins score for extreme precipitation (95th percentile) in all seasons.The ensemble average score improves by 0.29 (47%) in DJF, and 0.25 (52%) in JJA.Annually, the score improves by 0.17 (29%).The ocean-coupled models tended to have slightly higher Perkins score for the 95th percentile than the uncoupled models, though they also tended to do slightly worse for the fraction of dry days.
We summarized the added value of the CCAM downscaling using the integrated skill score, which combines both KGE and Perkins scores.The average Perkins score for precipitation includes the Perkins score across the entire distribution, and the 95th percentile, while the average score for temperature includes the scores for minimum, maximum, and average temperature, and the tails of the distribution.Figure 7 shows the KGE and average Perkins  seasonal and daily temperature for all models, with the ensemble average improving by 0.08 (10%) and 0.12 (18%) respectively.The annual cycle of temperature improves for all models except CNRM-CM6-1-HR, with an ensemble average improvement of 0.06 (6%).In this case, the ocean-coupled version of CNRM-CM6-1-HR performs better than the uncoupled version.Seasonal precipitation has been improved by downscaling for all models, with an ensemble average improvement of 0.23 (43%).The annual cycle of precipitation was improved for the majority of models, but worsened for four models.The ensemble average improvement was 0.09 (13%).The average Perkins score for precipitation also improved for the majority of models with downscaling, with the ensemble average improving by 0.10 (12%).The integrated skill score improved for all models except GFDL-ESM4.This is due to the lack of improvement for the annual cycle of precipitation, and the Perkins score for precipitation.GFDL-ESM4 was one of the higher resolution host models, however other CMIP6 host models with a similar resolution such as CNRM-CM6-HR and Nor-ESM2-MM experienced large improvements in performance with downscaling.The ensemble average score improved by 0.11 (16%) with downscaling.The ocean-coupled models tend to have higher overall scores than the uncoupled models.This is mainly due to higher Perkins score for precipitation, and higher score for the annual cycle of precipitation.
We assessed the model performance for the IPCC Regions for Australia (Figure 8) and the 12 QLD RPA (Figure 9 and Figures S27 and 28 in Supporting Information S1).For all regions in Australia, the integrated skill score improved.The smallest improvement was in Northern Australia.A similar pattern emerged for the QLD regional plan areas.For nine of the 12 regions, the integrated skill score improved for all models, with the ensemble average for each region improving by 0.05-0.40(9%-150%), see Figure 9.For the Cape York Region, GFDL-ESM4 was the only model that was not improved by downscaling, though the decline in the integrated skill score was small at 0.0004 (0.08%).For the Gulf Region, three models were not improved by downscaling, with decreases in the integrated skill score of between 0.03 and 0.08 (5%-11%).For the Southwest region two models were not improved by downscaling, with decrease in the integrated skill of between 0.02 and 0.06 (3%-8%).Cape York and the Gulf Region are both northern areas, which include coastal areas.These areas tended to have a stronger precipitation bias after downscaling than the rest of the continent.For the Southwest Region, the poor performance of two models after downscaling is primarily due to temperature.The largest model improvements were found in the densely populated South East QLD region, with relative integrated added value of 150%, due to large improvements in the simulation of both temperature and precipitation.

Discussion
We compared the performance of CMIP6 and dynamically downscaled CMIP6 models (CMIP6-CCAM) to identify whether dynamically downscaling adds value to the simulation of the Australian climate.We found that when downscaling CMIP6, CCAM improved the performance for both seasonal and daily temperatures, and the annual cycle of temperature and precipitation.CCAM downscaling also improved the fraction of dry days.The biggest improvements were found in extremes, with CCAM improving extreme minimum temperatures in all seasons, and improving precipitation extremes in DJF and JJA.Downscaling also reduced temperature and precipitation biases in mountainous regions along the eastern coast of Australia and led to large improvements in the densely populated South East QLD region.Downscaling CMIP6 with CCAM therefore improved the simulation of the present day-climate and provides a useful new data set for studying climate in Australia.The QldFCP-2 downscaled future projections will also provide vital information for adapting and responding to climate change in Australia.
There was more similarity in the climate of the downscaled models than in the climate of the host models for the historical period.This is due to the bias-correction applied to SSTs, which reduced the spread in the simulation of temperature and precipitation in the downscaled models as compared to the host models.If the SST bias correction had not been applied, we would expect the downscaled models to have a similar distribution of global temperatures as the host models.Further, as CCAM is a stretched grid model the domain is large, and so the host GCMs exert comparatively less control over the simulation than if the domain was smaller, and CCAM physics dominate the climate (Diaconescu & Laprise, 2013).However, the climate change patterns and trend will still behave in the downscaled models in a similar way as in the host GCMs (Figure S1 in Supporting Information S1), so while the downscaled models are similar in the present day, they will have varying climate change impacts.A future study will examine climate change impacts in the downscaled models.
The largest improvement with downscaling was found in minimum temperature biases, where downscaling with CCAM improved mean and extreme minimum temperatures.A better performance for minimum temperatures than maximum temperatures has been found for CCAM in other domains (Katzfey et al., 2016).This behavior may be due to the soil and vegetation characteristics in the CABLE land surface model (Di Virgilio et al., 2019;Wang et al., 2011).The current version of CCAM uses a small number of vegetation classes for Australia, so part of the maximum temperature bias may be due to issues with albedo, which is a combination of soil albedo and vegetation characteristics.The roughness length and leaf area index may also be a factor in the maximum temperature bias, which is also linked to the vegetation classes and differences between model and observation in calculating maximum temperature that is, withing canopy in CCAM versus cleared vegetation for observed data.Downscaling improved the 95th percentile of daily precipitation, and the fraction of dry days.CCAM has been found to accurately simulate light precipitation in previous studies (Di Virgilio et al., 2019;Nguyen et al., 2014).The CMIP6 models generally had too many low precipitation days, and not enough dry days, a common issue in parameterized convection models (Prein et al., 2015) and coarse grid models.CCAM has a scale-aware convection scheme, however, at a 10 km resolution the model is still relying on the convection parameterization scheme, like the CMIP6 models.The higher resolution may still account for some of the improvement in fraction of dry days, as the large grid box size in the GCMs means rain occurs at a low amount over a large area.CCAM's improvement in simulating dry days may also be due to recent improvements in CCAM's convection and aerosol schemes (Di Virgilio et al., 2019).
Two pairs of simulations with CCAM climate model were downscaled using ocean-coupled and uncoupled versions, and in total five models were run in ocean coupled mode.The uncoupled versions had prescribed, bias-corrected SSTs, which has been found to reduce biases when downscaling (Hoffmann et al., 2016;Takhsha et al., 2018;Xu & Yang, 2015).However, the lack of air-sea interaction may also introduce new biases (Toh et al., 2018).The uncoupled models tended to have lower biases than the coupled models, however, the overall skill score tended to be higher for the ocean coupled models, due to higher scores for the annual cycle of precipitation and the Perkins score for precipitation.For the two models that had both uncoupled and ocean-coupled versions, we did not find consistent results.For most variables examined in this paper, the NorESM2-MM model was better with ocean-coupling than without.For the CNRM-CM6-1-HR model, ocean coupling degraded performance slightly for most variables.Other aspects of simulated climate may benefit from coupling such as tropical cyclones, MJO related synoptic circulation and land-sea contrast and deserves more attention.
The added value of using downscaled projections has been actively researched in the regional climate modeling literature, with differing propositions and assessment goals.Di Luca et al. ( 2016) used two metrics representing the absolute values (or mean square error) and spatial pattern (spatial correlation) of 20-year climatological averages, in addition to three metrics focused on mean climate, standard deviation and 99th percentile.In recent years, there has been a strong focus on assessing the value of high-resolution modeling in terms of how well they simulate climate extremes.For example, Ciarlo et al. (2021) used daily data and the Perkins Skill score to quantify the added value based on climate extremes at grid-cell basis.Other approaches have also focused on processes-oriented evaluation of added value (Tamoffo et al., 2020).In our analysis, we extended the approach pioneered by Di Luca et al. ( 2016) and complemented with elements by Ciarlo et al. (2021).We included the annual cycle (monthly data) and probability distribution functions of daily data for four variables in addition to seasonal averages.The Australian climatic zones are characterized by strong seasonal contrasts of winter-(e.g., southern Australia) and summer-dominated (e.g., northern Australia) climate regimes and having a metric targeting these processes is essential.It is also well established that GCMs tend to overestimate the number of light rainfall days and underestimate the number of dry days as well as underestimating higher rainfall events (e.g., >40 mm/day).By using the Perkins score, its 5th and 95th percentile, and the number of dry days, our evaluation targeted these specific features where RCMs are expected to add value over GCMs.By accounting for three time-scales, and specifically assessing the tail of the distributions, our approach focuses on the expected strengths of high-resolution simulations, resulting from spatial details of topography, vegetation, coastline and mesoscale circulation.
Previous downscaling experiments in Australia have downscaled CMIP5 GCMs using CCAM and the WRF model (di Virgilio et al., 2020;Nishant et al., 2021;Syktus, Toombs, et al., 2020).CORDEX-Australasia found added-value from downscaling for mean precipitation, but unclear results for extreme precipitation (di Virgilio et al., 2020).Downscaling generally added value for maximum temperature, and improved mean and extreme cold temperatures (di Virgilio et al., 2020).The VCP19 (Victorian Climate Projections 2019) experiment, which downscaled six CMIP5 GCMS to 5 km for Victoria using CCAM, found downscaling improved detail and reduced bias for temperature and rainfall, particularly in mountainous regions (Clarke et al., 2019).VCP19 also found some improvements to extreme rainfall, particularly in mountainous areas where the mountains were not resolved in the global model (Clarke et al., 2019).Our results show downscaling the CMIP6 GCMs to 10 km generally improved extremes for precipitation and minimum temperatures.For maximum temperature, mean bias was improved, though the Perkins score was generally lower in DJF.While added-value was analyzed differently in the Di Virgilio et al. (2020) study, and the starting host models were different, these results show that our 10 km data set may have important improvements to extremes over the 50 km CORDEX-Australasia data set.Our data set, QldFCP-2, also has an advantage over the VCP19 and NARCliM (NSW and ACT Regional Climate Modeling project) data sets in that it is available Australia-wide and uses the latest set of CMIP6 GCMs.

Conclusion
A large ensemble (15 simulations) of high-resolution downscaled simulations at continental scale with CMIP6 models was completed for three SSP scenarios (126, 245, and 370) for Australia.This QldFCP-2 data set is the largest ensemble of high-resolution downscaled climate projections for CMIP6 for Australia.We evaluated both the downscaled simulations and host models simulations, using two new integrative metrics designed to assess model performance and added value of downscaling accounting for long-term spatial patterns, annual cycle and climate extremes.Downscaling the CMIP6 models with CCAM has improved the performance across most metrics and seasons and provided improvements in metrics relevant for impact assessment.The improvements in mountainous and coastal areas will be particularly important for impact studies, due to the high concentration of population along the coast.These improvements are due to a combination of the higher spatial resolution, and ability to resolve complex topography and land-sea contrasts.Bias-correction of SSTs, and the improvements in precipitation generally found in CCAM also likely contributed to improved simulation in these areas.The improvements to extreme precipitation will be important when looking at climate change impacts, and be useful for hydrological impact studies, including studies looking at impacts on flooding and soil erosion.The QldFCP-2 data set therefore provides a valuable new resource for understanding climate change impacts in Australia and it is the first of its kind based on CMIP6 models.

Data Availability Statement
The downscaled CCAM data is being published via CORDEX Australasia domain archive.The data used in this study is also available online (Chapman et al., 2023).The CMIP6 data used in this study are available through the Earth System Grid Federation at: http://esgf.llnl.gov/.

Figure 1 .
Figure 1.Study area for the evaluation of downscaled climate simulations.(a) Domain of analysis and elevation, derived from SRTM (Shuttle Radar Topography Mission; Gallant et al., 2009).mASL, meters above sea level.Twelve regional planning areas in the state of Queensland (QLD) were selected for assessment.CY, Cape York; GR, Gulf Region; FN, Far North; NW, Northwest; NQ, Northern QLD; CW, Central West; MIW, Mackay, Isaac, Whitsunday Region; CQ, Central QLD; SW, Southwest; DD, Darling Downs; WB, Wide Bay Burnett Region; SEQ, Southeast QLD.(b) Conformal Cubic Atmospheric model stretched grid, centered over Australia.

Figure 2 .
Figure 2. Climate change signal (SSP370, 2080-2099 vs. 1981-2010)  on annual temperature and precipitation in the CMIP6 ensemble (first realization only), and the models selected for downscaling.Colors for subsetted models show historical precipitation from 1981 to 2010 for all of Australia.Note that for calculating the ensemble mean of the subset, to ensure that ACCESS-ESM1-5 was not weighted more than the other models, the mean of the three ACCESS-ESM1-5 variants was calculated, and this value was used in the calculation of the subset ensemble mean.

Figure 3 .
Figure 3. Annual averages of monthly mean, maximum and minimum temperature (°C), and precipitation (mm/month) from observations and ensemble mean bias (model-AGCD) for the CMIP6 host models, and the downscaled CMIP6-CCAM ensemble.Time period 1981-2020.Comparison performed on a common 10 km grid.

Figure 4 .
Figure 4. Annual average of root mean square error and correlation for precipitation, maximum, minimum and average temperature from CMIP6 global climate models and CMIP6-CCAM models, in comparison with AGCD for the period 1981-2010 over Australia.

Figure 5 .
Figure 5. Heatmap of Perkins score for CMIP6 and CMIP6-CCAM as compared to AGCD across all of Australia, 1981-2010.First row shows temperature (Perkins score for entire distribution).Second row shows maximum temperature with Perkins scores for entire distribution and the 95th percentile of distribution for CMIP6 (CMIP6.95) and CCAM (CCAM.95).Third row shows minimum temperature with Perkins scores for entire distribution and fifth percentile of distribution for CMIP6 (CMIP6.95) and CCAM (CCAM.95).Values in cells are rounded to two decimal places.

Figure 6 .
Figure 6.Heatmap of difference in the fraction of dry days (row 1), and the Perkins score (row 2) for CMIP6 and CMIP6-CCAM as compared to AGCD across all of Australia, 1981-2010.Difference in the fraction of dry days calculated as Model-AGCD.Perkins scores are shown for entire distribution and for the 95th percentile of distribution for CMIP6 (CMIP6.95) and CCAM (CCAM.95).Values in cells are rounded to two decimal places.

Figure 7 .
Figure 7. Added value of Conformal Cubic Atmospheric model downscaling of CMIP6 over Australia.Panels show Kling-Gupta Efficiency (KGE) score for seasonal temperature and precipitation and annual cycle of temperature and precipitation, the Perkins skill score for daily temperature and precipitation, and the overall skill score which combines KGE and Perkins.Time period 1981-2010.Temperature results are the average of KGE for maximum, minimum and average temperature.Perkins score for precipitation is the average of score of entire distribution and 95th percentile.Perkins score for temperature is the average of fifth, 95th and entire distribution score for average, maximum and minimum temperature.The Integrated Skill Score is the average of the above six values.

Figure 8 .
Figure 8. Added value of Conformal Cubic Atmospheric model downscaling of CMIP6 over IPCC regions in Australia.Panels show integrated skill score, score for temperature and score for precipitation.Individual models shown as points.

Figure 9 .
Figure 9. Added value of Conformal Cubic Atmospheric model downscaling of CMIP6 over regional planning areas in Queensland.Panels show integrated skill score, score for temperature, and score for precipitation.