A context-free genetic programming (GP) method is presented that simulated local scale daily extreme (maximum and minimum) temperatures based on large scale atmospheric variables. The method evolves simple and optimal models for downscaling daily temperature at a station. The advantage of the context-free GP method is that both the variables and constants of the candidate models are optimized and consequently the selection of the optimal model. The method is applied to the Chute-du-Diable weather station in Northeastern Canada along with the National Center for Environmental Prediction (NCEP) reanalysis datasets. The performance of the GP based downscaling models is compared to benchmarks from a commonly used statistical downscaling model. The experiment results show that the models evolved by the GP are simpler and more efficient for downscaling daily extreme temperature than the common statistical method. The different model test results indicate that the GP approach significantly outperforms the statistical method for the downscaling of daily minimum temperature, while for the maximum temperature the two methods are almost equivalent. However, the GP method remains slightly more effective for maximum temperature downscaling than the statistical method.
 The General Circulation Models (GCMs) used to simulate the present climate and project future climate, are generally not designed for local or regional climate change impact studies. While they demonstrate significant skill at the continental and hemispherical scales and incorporate a large proportion of the complexity of the global system, they are inherently unable to present local subgrid-scale features and dynamics owing to their coarse spatial resolution (in order of 300 km × 300 km). Therefore, in many climate change impact studies, there is need to convert the GCM outputs into higher spatial resolution scenarios. High spatial resolution climate change scenarios are particularly needed to improve climate change impact assessment at local to regional scales. Recent advances in climate change impact studies have shown the necessity of high resolution climate change scenarios [Stone et al., 2003; Mearns et al., 1999]. The various methods used to generate high-resolution climate change scenarios are referred to as downscaling techniques.
 Several downscaling techniques have been proposed in the literature [Xu, 1999]. However in practice, two major approaches appear well established at the moment, namely the dynamic downscaling and the empirical (or statistical) downscaling. The former is a method of extracting local-scale information by developing and using limited-area models (LAMs) or regional climate models (RCMs) with the coarse GCM data used as boundary conditions. The basic steps are then to use the GCMs to simulate the response of the global circulation to large-scale forcing and the RCM to account for sub-GCM grid scale forcing such as complex topographical features and land cover inhomogeneity; and thus enhance the simulation of atmospheric circulations and climate variables at fine spatial scales. Although RCMs appear the most informative downscaling approach, they also have several limitations. RCMs still require considerable computing and human resources and they are as expensive to run as GCMs [Xu, 1999], and thus may not be of first choice for local small project.
 The latter seeks to derive the local scale information from the larger scale through inference from the cross-scale relationship using some random or deterministic functions. Regression-based downscaling methods rely on direct quantitative relationship between the local scale climate variable (predictand) and the variables containing the larger scale climate information (predictors) through some form of regression functions. To date, linear and non-linear regression, artificial neural networks, canonical correlation and principal component analysis have all been used to derive predictor-predictand relationships [Xu, 1999]. Even though it is not yet clear which method provides the most reliable and accurate downscaling results [Schoof and Pryor, 2001], the most widely used empirical downscaling method is the Statistical Down-Scaling Model (SDSM) [Wilby et al., 2002] which implements a simple linear regression. This likely indicates the preference of the users for simple and easy to use downscaling methods.
 The purpose of this paper is to identify simple and optimal multivariate downscaling models using evolutionary algorithms namely the genetic programming (GP). The GP approach is used to perform an automatic induction of simple downscaling models. The paper also aims to highlight the applicability of the GP technique as a simple and effective downscaling method for daily extreme (minimum and maximum) temperature estimates. We specifically focus on assessing grammatically-based genetic programming (GP) with variable and constant optimization. In addition, emphasis is given to evaluating and comparing the GP approach with the most commonly used regression-based downscaling method. GP has been successfully used to evolve appropriate models for various time series modeling, including rainfall-runoff modeling [Whigham and Craper, 1999] and the modeling of groundwater level fluctuations [Hong and Rosen, 2002]. However none of these applications have considered spatial downscaling of climate variables.
 Genetic programming (GP) introduced by Koza  is a method for constructing populations of mathematical models (function trees) using stochastic search methods namely evolutionary algorithms. For multivariate time series modeling using the GP approach, the ultimate objective of the evolutionary process is to discover an optimal equation (or model) for relating dependent variable (or predictand) and independent variables (or predictors). However, as the search space of all possible equations is extremely large particularly for multivariate time series, the heuristic search has to be restricted to only those equations that follow a specified grammar (or set of rules used to build equations). The GP system described here uses a context-free grammar [Whigham, 1995] along with Powell optimization method [Press et al., 1992]. The description of the GP method is limited herein to the needs of the present study. For more detailed description of the GP method, the readers are referred to other sources, such as Whigham and Keukelaar  and Babovic and Keijzer . The grammar determines how equations can be generated, what functions and variables can be included in an equation. The grammar is used to build equations as function trees. Figure 1 shows an example of a function tree (or derivation tree) based on a grammar named expression (exp). In the tree, the inner connection points (or inner nodes) are called binary arity functions (e.g., ‘+’, ‘/’ ‘−’) or non-terminals that require two arguments or sub-trees, whereas the leaf nodes are called terminals which represent external input variables and constants. In Figure 1, the whole tree represents the equation f(x, y) = x + (y × 5). Whatever the type of grammar used, it is essential to specify the list of functions and terminals that will be used to create the derivation trees. A maximum depth of derivation tree (or maximum expression depth) must also be set to halt the generation process and prevent the evolution of too complex (or larger) models. A higher maximum expression depth means more complex equations will likely be produced.
 The next step after setting up the grammar, is to generate the initial population. In this case, the grow method [Koza, 1992] is used to create random individuals (or trees) for the initial generation. The shape of the optimal tree is found using an evolutionary approach as follows:
 (i) Each individual (or equation) of the initial generation is evaluated and a score (or fitness) is determined for each population member. Here the score of each individual is based upon the optimization method, the error function and the parsimony. The latter is a weighting added to an individual depending on its tree-size. This term is used to bias the selection towards shorter equations that have the same prediction accuracy as longer equations. Then the best scoring members are selected for reproduction.
 (ii) Once individuals have been selected as parents for reproduction, three genetic operators (crossover, mutation, and direct reproduction) may be applied to generate a new population. A new population may also be generated through some combination of the three operators. This operator generates offspring (or new population of equations) by combining parts of the parents. This is by randomly selecting a sub-tree within each of the parents, then swapping them over. Crossover produces new population, but it does not introduce any new information into the population. Thus the population can become more and more homogeneous – leading to a premature convergence to a non-optimal solution. To guard against such premature convergence, a mutation of feature is often introduced in the population. There are various types of computational mutations (e.g., branch-mutation, node-mutation, etc) that may be used to introduce new information into the population [Babovic and Keijzer, 2000].
 (iii) Then, the process goes back to scoring the new population members. The iterative process stops when the specified maximum number of generation is reached.
 It is noteworthy that in classic (or standard) GP approach, the scoring of the population members is simply an error measurement based on the training data and the selected error function, there is no optimization. Here a Powell optimization is used, this means a number of evaluations are performed to discover the minimum error by altering the values in both the equation and the variables. This typically implies a more robust assignment of fitness or scoring. Recall that the optimization process involves finding the better values of the variables and the constants within the equation.
 Finally, different initial parameters have to be specified in the GP system. Table 1 summarizes the initial system parameters used in this study. The values of the system parameters (e.g., the mutation and crossover rate, the population size) are problem dependent, and are commonly selected by trial-and-error during the model calibration stage. A major advantage of this GP method as compared to classic multiple regression, is that the specific model structure is not chosen in advance, but is rather part of the optimization and search process. Furthermore, one of main advantages of the GP approach over nonlinear methods such as artificial neural networks, is that it provides explicit model structure (i.e., understandable mathematical equations) that can improve our knowledge of the predictand - predictor relationship.
Table 1. Initial GP System Parameters
Round Robin (RR): technique used for selecting parent models. RR Group Size: random group of size (=6 members) chosen for the RR selection, the individual with the best score within the group is selected. Powell Max. Evals: maximum number of evaluations of the error function using Powell optimizer. Powell Tolerance: parameter that controls how precisely the optimizer should try to locate the minimum. Adaptation Attempts: number of adaptive training iterations (or epochs). Adaptation Halt When (Improvement < 5%): stopping criterion for adaptive training. Terminate When (Generation = 6): stop criterion for the evolutionary process. Population size: total number of individuals (or models) in each generation. All other parameters are defined and described in the text.
 For the application of GP downscaling method, a meteorological station located within the Chute-du-Diable basin (station ID#7061560 located at 48°N, 71°W) in north-eastern Canada is chosen. This station has forty years of daily temperature records representing the current climate (1961 till 2000). The large-scale predictor variables (1961–2000) for the study area are derived from the NCEP reanalysis dataset [Kistler et al., 2001] at the closest grid point 50°N, 71°W. This dataset consists of large-scale predictor variables presented in Table 2. From the forty years of climate data, the first 30 years (1961–1990) are considered for calibrating the downscaling models while the remaining ten years (1991–2000) of data are for model validation. For the SDSM, the most important task in the downscaling process is the selection of the most relevant predictor variables. This screening is achieved with linear correlation analysis and scatter plots (between the predictors and the predictand variables). The influence of individual predictor varies on a month by month basis; therefore, the most appropriate combination of predictors has to be chosen by looking at the analysis output of all the twelve months. Here, six predictor variables have been found the most relevant (Table 3) for the downscaling with SDSM. For the GP system, the most important predictor variables are selected automatically during the genetic evolution process, and thus included in the best model (or equation) evolved. In this case, the best performing models evolved by the GP include only two predictor variables (Table 3).
Table 2. Large-Scale Predictor Variables Obtained From NCEP Reanalysis Dataset
Indicates p_, p5 or p8 which represent the variable values near surface, at 500 hPa height or 850 hPa height, respectively.
Definition of variables is the same as in Table 2.
temp, p500, p_v, s500, s850, sphu
temp, p500, p_v, p_z, s850, sphu
 The best evolved model for the downscaling of daily maximum temperature (Tmax) at the Chute-du-Diable station is as follow
This equation relies on two large scale predictor variables, the mean temperature (temp) and the 500 hPa geopotential height (p500) to generate local scale maximum temperature series. The evolved model (equation (1)) has an RMSE on the training set of 3.54°C and an RMSE on the testing set of 3.59°C with a model efficiency index (R2) of 92% (Table 3).
 The best evolved model for the downscaling of daily minimum temperature (Tmin) includes two predictor variables (temp, and p5_z (vorticity at 500 hPa height)) as follows
Similar to the previous model, equation (2) uses only two variables (temp, and p5_z) to downscaling the daily Tmin. The model evolved in equation (2) has an RMSE on the training set of 4.65°C and an RMSE on the testing set of 4.57°C with a model efficiency index (R2) of 89% (Table 3). The model test statistics (Table 3) indicate that the GP model is more effective in downscaling Tmax (R2 = 92%) than Tmin (R2 = 89%). In general, the model performance statistics also indicate that the GP based models outperform the SDSM for the downscaling of both the Tmin and Tmax. More interestingly the GP requires only one third of the number of variables used in the SDSM. It is shown from Table 3 and equations (1) and (2) that the context-free grammar GP approach can provide very simple and competitive downscaling models as compared to SDSM.
 To further assess the model performance in general, Figure 2 shows the scatter plots of simulated and observed daily Tmax and Tmin for the validation period (1991–2000). For daily Tmax below 0°C (i.e., winter season), SDSM and GP provide similar results, while for daily Tmax above 0°C (i.e., spring, summer and autumn), the GP provides better simulations than SDSM (Figure 2). Conversely, for daily Tmin below 0°C (i.e., from October to May) the GP significantly outperforms the SDSM, while for daily Tmin above 0°C (i.e., from June to September) the GP provides slightly more accurate simulations. Figure 2 also shows that the GP method is more effective at downscaling Tmax than Tmin as indicated in Table 3. In general, the model test results suggest that the GP approach can provide efficient alternative downscaling models for daily extreme temperatures.
 The comparative downscaling results indicate that the GP approach can provide simple and effective downscaling models. It is shown that the GP simulations of daily Tmin are more accurate than those generated by the SDSM. It is also found that the GP slightly outperforms the SDSM for downscaling daily Tmax. In addition, an important advantage of the models evolved by the GP is their simplicity and parsimony as compared to the SDSM model. Furthermore, the GP automatically selects the most relevant predictor variables to establish the predictand-predictors relationship, and does not require a preliminary screening for input variables. Further research should consider evaluating the GP approach for downscaling daily precipitation.
 Main GP routines have kindly been made available by P. A. Whigham. The author gratefully acknowledges the student contribution of X. Shi.