By continuing to browse this site you agree to us using cookies as described in About Cookies
Notice: Wiley Online Library will be unavailable on Saturday 7th Oct from 03.00 EDT / 08:00 BST / 12:30 IST / 15.00 SGT to 08.00 EDT / 13.00 BST / 17:30 IST / 20.00 SGT and Sunday 8th Oct from 03.00 EDT / 08:00 BST / 12:30 IST / 15.00 SGT to 06.00 EDT / 11.00 BST / 15:30 IST / 18.00 SGT for essential maintenance. Apologies for the inconvenience.
 During a development stage global climate models have their properties adjusted or tuned in various ways to best match the known state of the Earth's climate system. These desired properties are observables, such as the radiation balance at the top of the atmosphere, the global mean temperature, sea ice, clouds and wind fields. The tuning is typically performed by adjusting uncertain, or even non-observable, parameters related to processes not explicitly represented at the model grid resolution. The practice of climate model tuning has seen an increasing level of attention because key model properties, such as climate sensitivity, have been shown to depend on frequently used tuning parameters. Here we provide insights into how climate model tuning is practically done in the case of closing the radiation balance and adjusting the global mean temperature for the Max Planck Institute Earth System Model (MPI-ESM). We demonstrate that considerable ambiguity exists in the choice of parameters, and present and compare three alternatively tuned, yet plausible configurations of the climate model. The impacts of parameter tuning on climate sensitivity was less than anticipated.
 Model tuning is an integral part of the model development process, but is not extensively discussed in the literature. Although models and their configuration are well-documented, the process through which a particular model configuration comes into being is not, and as a result, the process of selecting a model configuration is shrouded in mystery. In this contribution we address ourselves to this gap in the literature.
 Model tuning is not a well-defined term. Often, model calibration or model tuning is associated with the last step of a broader model development cycle, after structural enhancements, improved parameterizations and refined boundary conditions have been implemented, wherein selected parameters are adjusted so as to better match the model results with some targeted features of the climate system [Randall and Wielicki, 1997]. The idea that models need to be harmonized with observations is of course applicable to the model development process as a whole, as parameterizations and grid configurations are usually selected based on their ability to improve the representation of some aspect of the climate system [Jakob, 2010]. Only seldom do we implement model changes that degrade the performance of a climate model; improved aspects of the model results following a changed parameterization are frequently used as proof of concept. Because this model development happens over generations, and is difficult to describe comprehensively, in this paper the discussion is focused on how a final parameter configuration of a model is selected with an external set of goals in mind.
 The need to tune models became apparent in the early days of coupled climate modeling, when the top of the atmosphere (TOA) radiative imbalance was so large that models would quickly drift away from the observed state. Initially, a practice to input or extract heat and freshwater from the model, by applying flux-corrections, was invented to address this problem [Sausen et al., 1988]. As models gradually improved to a point when flux-corrections were no longer necessary [Colman et al., 1995; Guilyardi and Madec, 1997; Boville and Gent, 1998; Gordon et al., 2000], this practice is now less accepted in the climate modeling community. Instead, the radiation balance is controlled primarily by tuning cloud-related parameters at most climate modeling centers [e.g., Watanabe et al., 2010; Donner et al., 2011; Gent et al., 2011; HadGEM2 Development Team, 2011; Hazeleger et al., 2012], while others adjust the ocean surface albedo [Hourdin et al., 2012] or scale the natural aerosol climatology to achieve radiation balance [Voldoire et al., 2012]. Tuning cloud parameters partly masks the deficiencies in the simulated climate, as there is considerable uncertainty in the representation of cloud processes. But just like adding flux-corrections, adjusting cloud parameters involves a process of error compensation, as it is well appreciated that climate models poorly represent clouds and convective processes. Tuning aims at balancing the Earth's energy budget by adjusting a deficient representation of clouds, without necessarily aiming at improving the latter.
 Arguably, the most basic physical property that we expect global climate models to predict is how the global mean surface air temperature varies naturally, and responds to changes in atmospheric composition and solar insolation. We usually focus on temperature anomalies, rather than the absolute temperature that the models produce, and for many purposes this is sufficient. Figure 1 instead shows the absolute temperature evolution from 1850 till present in realizations of the coupled climate models obtained from the Coupled Model Intercomparison Project phase 3 (CMIP3) [Meehl et al., 2007] and phase 5 (CMIP5) [Taylor et al., 2012] multi-model datasets available to us at the time of writ ing, along with two temperature records reconstructed from observations [Brohan et al., 2006]. There is considerable coherence between the model realizations and the observations; models are generally able to reproduce the observed 20th century warming of about 0.7 K, and details such as the years of cooling following the volcanic eruptions, e.g., Krakatau (1883) and Pinatubo (1991), are found in both the observed record and most of the model realizations.
 Yet, the span between the coldest and the warmest model is almost 3 K, distributed equally far above and below the best observational estimates, while the majority of models are cold-biased. Although the inter-model span is only one percent relative to absolute zero, that argument fails to be reassuring. Relative to the 20th century warming the span is a factor four larger, while it is about the same as our best estimate of the climate response to a doubling of CO2, and about half the difference between the last glacial maximum and present. To parameterized processes that are non-linearly dependent on the absolute temperature it is a prerequisite that they be exposed to realistic temperatures for them to act as intended. Prime examples are processes involving phase transitions of water: Evaporation and precipitation depend non-linearly on temperature through the Clausius-Clapeyron relation, while snow, sea-ice, tundra and glacier melt are critical to freezing temperatures in certain regions. The models in CMIP3 were frequently criticized for not being able to capture the timing of the observed rapid Arctic sea-ice decline [e.g., Stroeve et al., 2007]. While unlikely the only reason, provided that sea ice melt occurs at a specific absolute temperature, this model ensemble behavior seems not too surprising when the majority of models do start out too cold.
 In addition to targeting a TOA radiation balance and a global mean temperature, model tuning might strive to address additional objectives, such as a good representation of the atmospheric circulation, tropical variability or sea-ice seasonality. But in all these cases it is usually to be expected that improved performance arises not because uncertain or non-observable parameters match their intrinsic value – although this would clearly be desirable – rather that compensation among model errors is occurring. This raises the question as to whether tuning a model influences model-behavior, and places the burden on the model developers to articulate their tuning goals, as including quantities in model evaluation that were targeted by tuning is of little value. Evaluating models based on their ability to represent the TOA radiation balance usually reflects how closely the models were tuned to that particular target, rather than the models intrinsic qualities.
 These issues motivate our present contribution where we both document and reflect on the model tuning that accompanied the preparation of a new version of our model system for participation in CMIP5. Through the course of preparation we took note of the decision-making process applied in selecting and adjusting parameters, and these notes are elaborated upon in Section 2. Because a number of the authors were new to model development the tuning exercise served as a learning process, one in which questions emerged that might have been taken for granted by the more experienced of the model developers, but are nonetheless of interest. As decisions were made, often in the interest of expediency, a nagging question remained unanswered: To what extent did our results depend on the decisions we had just made? Although the idea of a perturbed physics ensemble, through which an ensemble of simulations with different parameter settings is explored, was introduced partly to address this very question [Stainforth et al., 2005], such an ensemble tends to produce models with unlikely parameter settings [Rodwell and Palmer, 2007], whereas during the tuning-process we adjust parameters in a more goal-oriented way.
 After the point when our model was frozen and the CMIP5 production runs were initiated we therefore revisited some of our earlier decisions, and asked how our model might have differed had a slightly different path been followed. In so doing we created a small number of alternative worlds; model configurations that were tuned by following a different branch in our tuning strategy. Using these alternative “worlds” as plausible configurations of our model that could have emerged in the development process, we explore how sensitive our model system is to the details of its configuration. These simulations help answer the question as to how much tuning really improves a model, and what aspects of its critical behavior, for instance its patterns of variability or its climate sensitivity, depend on the tuning. Our hope is that this discussion will help to demystify the climate model tuning process.
2. Tuning the Model Climate
 A few model properties can be tuned with a reasonable chain of understanding from model parameter to the impact on model representation, among them the global mean temperature. It is comprehendible that increasing the models low-level cloudiness, by for instance reducing the precipitation efficiency, will cause more reflection of the incoming sunlight, and thereby ultimately reduce the model's surface temperature. Likewise, we can slow down the Northern Hemisphere mid-latitude tropospheric jets by increasing orographic drag, and we can control the amount of sea ice by tinkering with the uncertain geometric factors of ice growth and melt. In a typical sequence, first we would try to correct Northern Hemisphere tropospheric wind and surface pressure biases by adjusting parameters related to the parameterized orographic gravity wave drag. Then, we tune the global mean temperature as described in Sections 2.1 and 2.3, and, after some time when the coupled model climate has come close to equilibrium, we will tune the Arctic sea ice volume (Section 2.4). In many cases, however, we do not know how to tune a certain aspect of a model that we care about representing with fidelity, for example tropical variability, the Atlantic meridional overturning circulation strength, or sea surface temperature (SST) biases in specific regions. In these cases we would rather monitor these aspects and make decisions on the basis of a weak understanding of the relation between model formulation and model behavior.
 Formulating and prioritizing our goals is challenging. To us, a global mean temperature in close absolute agreement with observations is of highest priority because it sets the stage for temperature-dependent processes to act. For this, we target the 1850–1880 observed global mean temperature of about 13.7°C [Brohan et al., 2006]. Beyond that, we prioritize having globally averaged TOA shortwave absorption and outgoing longwave radiation in good agreement with satellite observations, along with a representation of important climate variability modes. We would accept a model if the global mean cloud cover is above 60 percent in present-day climate, even if satellite-estimates are generally higher, and global mean liquid water paths only in the range 50–80 gm−2, which is consistent with estimates over the oceans from microwave instruments onboard satellites [e.g., O'Dell et al., 2008] while the bulk of observational estimates would allow a broader range.
 We further put emphasis on having a consistent overall strategy for tuning our model, whereby most parameters are set to published values and changes among different model versions and resolutions are kept at a minimum. Although this may mean that sub-optimal parameter settings are used in the short term, we believe that this strategy increases the utility of the model (for instance when results are compared across different configurations), and may facilitate the longterm development of the model. If model parameters vary significantly from one model version to the next it is not easy to know from where deficiencies or improvements arise.
 The experiments presented below were conducted by modifying the Max Planck Institute Earth System Model at base-resolution (MPI-ESM-LR (M. Giorgetta et al., Climate variability and climate change in MPI-ESM CMIP5 simulations, manuscript in preparation, 2012)), which consists of ECHAM6 version 6.0, at T63 spectral resolution with 47 vertical levels (B. Stevens et al., The Atmospheric Component of the MPI-M Earth System Model: ECHAM6, manuscript in preparation, 2012), including the JSBACH land model (V. Brovkin et al., Evaluation of vegetation cover and land-surface albedo in MPI-ESM CMIP5 simulations, submitted to Journal of Advances in Modeling Earth Systems, 2012), coupled to the MPIOM ocean model at 1.5 degree resolution with 40 vertical levels (J. Jungclaus et al., MPIOM: Characteristics of the ocean simulations, manuscript in preparation, 2012).
2.1. The Tuning Process
 We tune the radiation balance with the main target to control the pre-industrial global mean temperature by balancing the TOA net longwave flux via the greenhouse effect and the TOA net shortwave flux via the albedo affect. The methodology of tuning the radiation balance may vary between model development groups, and is usually adapted to the specific goals and constraints of the exercise. After a problem has been identified in the coupled climate model, we iterate the following steps until a satisfactory solution is found:
1.Short runs of single months, or if possible one or more years, with prescribed observed SST's and sea ice concentration; first with reference parameter settings, and then altered parameter settings.
2.A longer simulation with altered parameter settings obtained in step 1 and observed SST's, currently 1976–2005 from the Atmospheric Model Intercomparison Project (AMIP), is compared with the observed climate.
3.Implement the changes in the coupled model setup to run under pre-industrial conditions and evaluate the altered climate. Frequently, we make small parameter changes in this step to fine-tune the climate, without first revisiting steps 1 and 2.
 Our tuning process resembles the protocol described by Gent et al.  as used in the preparation of CCSM4. They tune the individual model components first in uncoupled mode (step 2), and after coupling (step 3) they allow only changing one cloud parameter to adjust the radiation balance, and the sea ice albedo in order to adjust the Arctic sea ice volume. A somewhat different approach was taken by Watanabe et al.  who tuned a number of parameters related to the cloud, convection, turbulence, aerosol, and sea ice schemes iteratively every 5 years, while running their model (MIROC5) in coupled mode for about thousand years. Below we explain some of the most important parameters that we use to tune the radiation balance.
2.2. Cloud Processes and the Radiation Balance
 ECHAM6 predicts cloud fraction based on the relative humidity [Sundqvist et al., 1989], distinguishes liquid and ice clouds [Lohmann and Roeckner, 1996], and accounts for vertical transport by shallow and deep convective clouds [Tiedtke, 1989; Nordeng, 1994]. The major uncertain climate-related cloud processes which are frequently used for tuning ECHAM are illustrated in Figure 2, some of which will be explored below, while Figure 3 shows the influence of five model parameters on globally averaged model properties, including the TOA net, longwave and shortwave fluxes, cloud cover, cloud liquid water- and water vapor paths.
2.2.1. Cloud Inhomogeneity
 Most climate models represent clouds with their fractional coverage in each grid cell. One can picture them as ‘cloud-boxes’ stacked vertically, under assumptions of how they overlap spatially to yield the total cloud cover. Each cloud-box contains cloud liquid and/or ice and we usually have no information on how that condensate is distributed horizontally. The radiative properties of clouds depend non-linearly on their thickness: A twice as thick cloud is less than twice as reflective, all other things being equal, and only very thin clouds do not behave as nearly ideal black body absorbers in the infrared spectrum. In reality, cloud thickness varies significantly on the spatial scale of a typical climate model grid cell, and therefore the mean cloud radiative effect is smaller than the radiative effect of the mean model cloud. In ECHAM6 the effect of cloud inhomogeneity is modeled by multiplying the cloud liquid and ice contents by respective homogeneity factors before radiation is calculated [Cahalan et al., 1994]. A perfectly homogeneous cloud has a homogeneity factor of unity, while inhomogeneous clouds have factors less than one. At increasing resolution the homogeneity factor should in principle increase, as one begins to resolve some of the cloud inhomogeneities. There is some evidence that the inhomogeneity parameters can be replaced by making assumptions about the sub-grid scale distribution of cloud water [Tompkins, 2002] and incorporating this information into radiation calculations [Pincus et al., 2006]. Figure 3, right column, shows how the homogeneity factor for liquid clouds barely influences any of the global quantities that we monitor, but the TOA net shortwave flux and thereby the TOA imbalance. This makes the parameter convenient as a tuning parameter for closing the radiation balance. The cloud homogeneity factor for liquid clouds was used as the only parameter by Hazeleger et al.  to tune the EC-EARTH model.
2.2.2. Moist Shallow Convective Processes
 Vertical transport by convective updrafts, and sometimes downdrafts, is parameterized in global climate models mostly by means of mass-flux schemes of varying complexity. In essence, these schemes diagnose the mass-flux of the updrafts, the entrainment of air into the updraft from the surroundings and detrainment of air out of the updraft as functions of height. The three are connected by mass conservation to provide two independent updraft properties. More mass-flux leads naturally to more transport, while the role of entrainment is more complicated. Typically, more entrainment will act to reduce the buoyancy of the updraft making the convection less vigorous and thereby less efficient. In ECHAM6 a modified version of the Tiedtke convection-scheme [Tiedtke, 1989; Nordeng, 1994] is applied, which distinguishes between shallow, deep and mid-level convection in its formulation (Figure 2). This scheme uses a single updraft to effectively represent the real-world spectrum of convective cloud updrafts of varying sizes and strengths.
 With respect to the radiation balance, the main effect of shallow convection is to export cloud water from the boundary layer to the free atmosphere where it tends to evaporate, thereby reducing the thickness and extent of the boundary-layer clouds. The strength of this process is influenced mainly by two model parameters. The convective mass-flux above the level of non-buoyancy (leftmost column of Figure 3) is representing the most vigorous fraction of the updraft ensemble that ‘overshoots’ from the level where the mean updraft loses its buoyancy, to the next model level. The overshooting parameterization is conceptually unsatisfactory, and in the future we hope to replace it by a formulation involving the vertical updraft velocity. Increasing the parameter leads to less, and thinner boundary-layer clouds, which increases surface temperature because more sunlight is absorbed by the system. Increasing instead the entrainment rate for shallow convection (second column of Figure 3) has the opposite effect on the cloud fields and the radiation balance; increased entrainment dilutes the updrafts, making them weaker and thereby more cloud liquid water is retained in the boundary-layer clouds.
2.2.3. Deep Convective Processes
 The parameterization of deep convection plays a more complex role in a climate model than shallow convection. Deep convective processes control basic features of the Tropical mean circulation, and are responsible for most of the Tropical rainfall. They are central to Tropical variability and help determine the vertical temperature structure in the Tropics.
 In relation to the radiation balance, the lateral entrainment rate for deep convection acts much like that for shallow convection, with the important difference that the low-level cloud-cover increase with increasing entrainment is to some extent compensated by loss of high-level ice clouds from the outflow of deep convection. There is also increasing amounts of water vapor with increasing entrainment rate, because more water is mixed into the free troposphere as less water rains directly out from the weakening updrafts. Associated with convective cloud water detrainment is a cooling of the upper troposphere due to evaporation on one hand, and radiative warming from the formation of cirrus clouds.
 Increasing the conversion rate of cloud water to rain in convective systems generally leads to less cloud cover and less atmospheric water vapor, as more water is deposited directly from deep convective systems to the surface. By this parameter, both TOA net shortwave and net longwave fluxes both increase in magnitude, while approximately maintaining TOA radiation balance. This makes the parameter useful for adjusting the level of the TOA net shortwave and longwave fluxes.
2.3. Controlling the Global Mean Surface Temperature and Climate Drift
 A particular problem when tuning a coupled climate model is that it takes thousands of years for the deep ocean to be equilibrated. In many cases, it is not computationally feasible to redo such long simulations several times. Therefore it is valuable to estimate the equilibrium temperature with good precision long before equilibrium is actually reached. Ideally, one would like to think that if we tune our model to have a TOA radiation imbalance that closely matches the observed ocean heat uptake in simulations where SST's are prescribed to the present-day observed state with all relevant forcings applied, then the coupled climate model attains a global mean temperature in reasonable agreement with the observed. Recent studies suggest that the ocean heat uptake is of 0.5–0.7 Wm−2 when averaged over the Earth's total surface area, indicating that the present-day climate is out of balance [Hansen et al., 2011; Stevens and Schwartz, 2012]. There are at least three reasons why abiding to this ideal need not be successful:
1.Climate models may not exactly conserve energy.
2.The climate sensitivity of the model to the various forcings may not match the real climate system, and the forcings themselves may be erroneous.
3.Local SST biases in the coupled model may influence the atmospheric state, for example cloudiness, and thereby shift the global mean temperature.
 To investigate whether climate models leak energy, Figure 4 shows the relation between TOA energy imbalance and global mean temperature for MPI-ESM-LR (blue) and the CMIP3 (light gray) and CMIP5 (gray) multi-model ensembles from control simulations of pre-industrial climate. Climate drift is indicated by the trails, and most models have fairly low drift during the typically 500-year long control runs. Some models drift considerably, up to 1 K. Models will relax slowly towards their equilibrium state approximately along slopes corresponding to their climate sensitivity, as indicated by the blue and red arrows [e.g., Gregory et al., 2004]. Among the model simulations whose data were available at the time of this analysis, there is a tendency for drift in the CMIP5 models to be less pronounced than in some of the CMIP3 models, and there is a reduction in the number of warm and cold biased models in CMIP5. Only a few models are close to zero imbalance, or likely to relax to near-zero imbalance. If a model equilibrates at a positive radiation imbalance it indicates that it leaks energy, which appears to be the case in the majority of models, and if the equilibrium balance is negative it means that the model has artificial energy sources. We speculate that the fact that the bulk of models exhibit positive TOA radiation imbalances, and at the same time are cold-biased, is due to them having been tuned without account for energy leakage.
 We investigated the leakage of energy in MPI-ESM-LR of about 0.5 Wm−2 and found that it arises for the most part from mismatching grids and coastlines between the atmosphere and ocean model components. Further, some energy is lost due to an inconsistent treatment of the temperature of precipitation and river runoff into the ocean, and a small leakage of about 0.05 Wm−2 occurs in a not yet identified part of the atmosphere. When run with prescribed SST's and sea ice during present-day conditions and forcings (1976–2005), ECHAM6 has a TOA imbalance of 0.53 Wm−2, which is barely enough to compensate the coupled models energy leakage. This would indicate that the model should be too cold when run in coupled mode because it is effectively well below the present-day observed ocean heat uptake (0.5–0.7 Wm−2). Yet, the coupled model arrives relatively close to the pre-industrial temperature. We shall see more examples of this behavior in Section 3.
 For these reasons we fine-tune the global mean temperature in coupled mode. Once we know the energy leakage of the coupled climate model system, it is relatively easy to estimate the equilibrium temperature based on short simulations. Let R be the TOA net radiation imbalance, L the model energy leakage, T the current global mean temperature, and λ = ∂R/∂T the climate feedback factor, then the equilibrium temperature (Teq) can be derived from the energy balance equation:
The imbalance and global mean temperature should be averaged over a long period, at least 10 years, sometimes longer, while the feedback can be estimated by regression, if it is not known a priori. If thereby the equilibrium temperature is outside the target range, measures can be taken to adjust the radiation balance as described in Section 2.1. In our experience it is possible to successfully tune our model's equilibrium temperature in about hundred years of coupled simulation, well before equilibrium is reached.
2.4. Tuning the Arctic Sea Ice
 The decline of Arctic sea ice extent in recent decades has certainly caught the attention of the scientific community, as well as the public, and in the past it has been difficult for models to simulate this decline [e.g., Stroeve et al., 2007; Rampal et al., 2011]. As argued in the introduction, a key to simulating the evolution of the observed sea ice extent is to have a reasonable surface temperature, while also the mean thickness of the sea ice and the parameterization of surface albedo are important factors determining the susceptibility to external forcing [e.g., Holland and Bitz, 2003]. Unfortunately, sea ice thickness is a challenging quantity to observe; most of our knowledge is based on submarine records from the central parts of the Arctic and only recently has it been possible to obtain sporadic observations from satellite altimetry [Kwok and Rothrock, 2009]. Based on what we currently know, we aim at having an annual mean Arctic ice volume of about 20–25·1012 m3 in our pre-industrial climate, which corresponds to a mean sea ice thickness of about 2–2.5 m in the Northern Hemisphere given a sea ice area of about 10·1012 m2.
 In the past, the parameterization of snow and sea ice albedo was often used to tune the sea ice volume. Eisenman et al.  argue that sea ice thickness is very sensitive to even small changes made to the model ice albedo using an energy balance model, while DeWeaver et al.  show that in their fully coupled climate model several compensating processes limit the effectiveness of tuning sea ice with albedo. Either way, detailed and accurate observations of the snow and sea ice albedos are now available, and in ECHAM6 we apply a scheme that is better constrained by these empirical observations, including processes such as aging of snow and a representation of melt ponds on the top of sea ice [Pedersen et al., 2009; E. Roeckner et al., Impact of melt ponds on Arctic sea ice in past and future climates as simulated by MPI-ESM, manuscript in preparation, 2012]. For these reasons, we have for now abandoned the strategy of tuning the sea ice with the surface albedo parameters.
 The sea ice in the MPI-ESM-LR model is represented by a fractional coverage and a mean sea ice thickness in every grid-cell [Hibler, 1979]. As new ice is formed it is not readily known if the new ice primarily acts to thicken the existing ice, or if it mainly increases the fraction of the grid cell that is ice covered. Likewise, as the sea ice melts, it is not known if it does so from the top and bottom, or from the sides. In MPI-ESM-LR the geometry of melting and freezing processes are controlled by two non-dimensional parameters, cmelt and cfreeze, which can be varied between zero and one.
 Changing these two parameters has little impact on Antarctic sea ice (Figure 5). In the Arctic, varying cfreeze up and down from the MPI-ESM-LR model default value (2/3) allows changing the sea ice volume moderately by ±·1012 m3. Increasing cfreeze permits more open ocean to exist during freeze-up, which enhances the ocean heat-loss to the atmosphere and thereby allows more sea ice to form. This is because even a thin layer of ice is effective in insulating the upper ocean, thereby reducing heat loss and inhibiting further sea ice formation. Even though the process is only effective in the beginning of the winter, the signature is seen in the ice volume throughout the year, while there is almost no impact on the sea ice area. The cmelt parameter had a small impact on sea ice area during the Arctic melt-season, but no impact in other seasons, nor on the sea ice volume.
2.5. The Final Tuning of MPI-ESM-LR for Participation in CMIP5
 The model development and final parameter tuning process leading up to the new MPI-ESM-LR model was targeted at deficiencies identified in the predecessor coupled climate model ECHAM5/MPIOM, here we shall focus on the parts where tuning played a central role: The pre-industrial control run global mean temperature was warm-biased at about 14.3°C, and the control simulation exhibited a weak drift of about 0.2 K in 500 years (Figure 4), while at the same time the model had too thick Arctic sea ice with a maximum that was centered near the North pole, rather than north of the Canadian Archipelago as is observed. Further, the radiation balance of ECHAM5 was tuned to be close to the ERBE satellite estimates of OLR, while newer CERES estimates indicate a higher level of OLR.
 ECHAM6 incorporates a range of structural model improvements over ECHAM5, including increased atmospheric vertical resolution better representing the stratosphere, a new formulation of the snow and ice albedo, a minor change to the convection excess buoyancy, updated ozone and aerosol climatologies, an improved coupling between the atmosphere and land-surface, and several bug fixes. After these changes were introduced, the first parameter change was a reduction in two non-dimensional parameters controlling the strength of orographic wave drag from 0.7 to 0.5. This greatly reduced the low zonal mean wind- and sea-level pressure biases in the Northern Hemisphere in atmosphere-only simulations, and further had a positive impact on the global to Arctic temperature gradient and made the distribution of Arctic sea-ice far more realistic when run in coupled mode. In a second step the conversion rate of cloud water to rain in convective clouds (Section 2.2.3), was doubled from 1.0·10−4 s−1 to 2.0·10−4 s−1 in order to raise the OLR to be closer to the CERES satellite estimates.
 At this point it was clear that the new coupled model was too warm compared to our target pre-industrial temperature. Different measures using the convection entrainment rates, convection overshooting fraction and the cloud homogeneity factors were tested to reduce the global mean temperature. In the end, it was decided to use primarily an increased homogeneity factor for liquid clouds from 0.70 to 0.77 combined with a slight reduction of the convective overshooting fraction from 0.22 to 0.21, thereby making low-level clouds more reflective to reduce the surface temperature bias. Now the global mean temperature was sufficiently close to our target value and drift was very weak. At this point we decided to increase the Arctic sea ice volume from 18·1012 m3 to 22·1012 m3 by raising the cfreeze parameter from 1/2 to 2/3. ECHAM5/MPIOM had this parameter set to 4/5. These three final parameter settings were done while running the model in coupled mode.
3. Parallel Worlds
 Usually there are multiple routes to reach a certain goal during the tuning process, even within the rather limited parameter set that we use. In building MPI-ESM-LR we chose one such route. To subsequently explore how this choice influenced our results, we have produced three alternatively tuned models here named World 1, 2 and 3. Each alternative World was created by first perturbing one parameter, and then adjusting one other parameter until the TOA imbalance and liquid water path are again close to that of the default model. This was done by iterating the first two steps in the tuning process outlined in the previous section. For the purpose of demonstration, this process was slightly simplified over the normal tuning procedure, which would involve using more parameters and observational targets, and long simulations in coupled mode. The alternative worlds were motivated by previous studies by Klocke et al. , Bender  and Stainforth et al. :
1.Klocke et al.  studied a perturbed physics ensemble using ECHAM5, and found that the parameter controlling the convective mass-flux above the level of non-buoyancy explained most of the variability in the climate sensitivity within their ensemble. Here we increase this parameter by about 50 percent, and, as in their study we also increase the shallow convection entrainment rate to compensate the loss of cloudiness. The resulting entrainment rate is closer to estimates from large-eddy simulations by Siebesma and Cuijpers . Extending the results of Klocke et al. , World 1 should have had a climate sensitivity to a doubling of CO2 increased from 3 K to 4–5 K, which, as we shall see, turned out not to be the case.
2.Bender  lowered the planetary albedo of the CAM3 climate model, from close to the ERBE satellite dataset to instead resemble the CERES data. This was done, partly, by increasing a parameter analogous to the convective cloud conversion rate from cloud water to rain (see Figure 3). Bender  found only a small increase from 2.26 K to 2.50 K of the models climate sensitivity to a doubling of CO2. In World 2 we lower the planetary albedo by about 1 percent by doubling the conversion rate, and compensate the resulting shift in the TOA imbalance by lowering the liquid cloud homogeneity parameter.
3.Stainforth et al.  created a large perturbed physics ensemble based on the HadAM3 climate model, finding climate sensitivities to a doubling of CO2 ranging between 2 and 11 K. They found that the ensemble members with sensitivities above 8 K were all related to perturbations of the convective cloud entrainment rate. In World 3 we lower the entrainment rate for deep convection to less than one third of ECHAM6's standard value. As in World 2 we compensate the radiation balance with the liquid cloud homogeneity parameter.
Table 1 shows the final parameter settings along with key global mean properties from simulations with prescribed sea surface temperature and sea ice concentration. The tabulated simulations have prescribed historical SST's and sea ice concentrations (AMIP). Next, we shall couple these alternative worlds to the ocean model component, then assess the alternative models performance in representing the mean state and the Tropical variability, and finally we shall study climate sensitivity to a doubling of CO2 when coupled to a mixed-layer ocean.
Table 1. Overview of Parameter Settings and Global Mean Properties of the Standard ECHAM6 and Three Alternative Modelsa
Cloud mass-flux above level of non-buoyancy
Entrainment rate for shallow convection [m−1]
Entrainment rate for deep convection [m−1]
Conversion rate to rain in convective clouds [s−1]
Homogeneity of liquid clouds
Values are for AMIP runs with prescribed SSTs and sea-ice evolution, averaged over the years 1976–2005. Empty fields means a parameter is set to default. Reported observed values are from Stevens and Schwartz .
Total cloud cover [percent]
Water vapor path [kg/m2]
Liquid water path [g/m2]
Ice water path [g/m2]
Total precipitation [mm/d]
Surface downwelling shortwave [W/m2]
Surface downwelling longwave [W/m2]
Surface net shortwave [W/m2]
Surface net longwave [W/m2]
Surface sensible heat flux [W/m2]
Surface latent heat flux [W/m2]
Energy to melt snow [W/m2]
Greenhouse effect [W/m2]
Reflected shortwave TOA [W/m2]
Planetary albedo [percent]
Shortwave cloud radiative effect at TOA [W/m2]
Longwave cloud radiative effect at TOA [W/m2]
Shortwave net at TOA [W/m2]
Longwave net at TOA [W/m2]
Imbalance at TOA [W/m2]
3.1. Coupled Model Climate
 After having tuned the models to provide similar TOA radiation imbalances in AMIP mode, we coupled the various versions of ECHAM6 to the MPIOM ocean component and simulated 200 years under pre-industrial conditions (Figure 6). The first interesting point is that the alternative worlds relatively quickly approach equilibrium global mean surface air temperatures, but the surface temperature at which they equilibrate is very different, with more than one degree from the coldest (World 3) to the warmest (World 1). After 200 years World 1 is still warming, and we estimate it will warm further by 0.1 to 0.2 K, while the other two have come into balance with a model system energy leakage of about 0.5 Wm−2 (Section 2.3). This demonstrates the point made earlier with controlling climate drift in coupled mode, because the pre-industrial temperature cannot be predicted with sufficient precision using the AMIP simulations.
 The mean ocean circulation was only weakly impacted by the parameter changes studied here. A significant impact was found in World 3 in the Atlantic meridional overturning circulation which was raised by 1–2 Sv. The associated northward ocean energy transport likewise increased by 5–10 percent. The colder high latitudes lead to dense water formation, which increases the ocean deep convection. This response we often find during cooling transients, and usually the enhanced circulation relaxes as the model approaches equilibrium.
 The response of the annual mean Arctic sea ice volume and extent is correlated with the global mean temperature. In particular, the colder World 3 exhibits a rapid growth of sea ice volume to more than 30·1012 m3 towards the end of the simulation. In Section 2.4 we explained how we can tune the Arctic sea ice volume by adjusting geometric factors controlling the growth of ice. However, even with extreme settings it would not have been possible to reduce the sea ice volume in World 3 to our target value of about 20–25·1012 m3, hence the necessity of tuning the temperature in order to be able to represent sea ice.
3.2. Evaluating the Climate
 Biases relative to observations and reanalyses in our standard evaluation for the AMIP simulations with fixed SST's are remarkably similar, both in geographical and vertical structure and in magnitude. If anything, pressure biases in the Southern Hemisphere are slightly reduced in World 2 and 3, and upper tropospheric temperatures are best represented by World 1 and 3. During the development of MPI-ESM-LR, we regularly referred to a set of diagnostics proposed by Reichler and Kim  to get an overview of the model performance and to quickly compare different model versions against each other. Following Stevens et al. (manuscript in preparation, 2012), a selection of these indices is shown in Figure 7. The indices are normalized relative to the average performance of the CMIP3 coupled model ensemble in representing the corresponding variables.
 It is no surprise that the AMIP simulations perform generally better than the coupled simulations in terms of representing the mean climate, as they are helped by the prescribed SST's and sea ice. Among the AMIP simulations World 1 is performing slightly worse than the rest, and it does so consistently across most of the individual indices, while the tables turn when inspecting the coupled simulations. The latter is in our experience mainly an artifact of comparing pre-industrial simulations to present-day observations combined with the World 1 coupled simulation being the warmest (Figure 6), thereby providing better scores in the temperature-sensitive indices. This temperature-effect can also be seen in the cold World 3 simulation, and when comparing MPI-ESM-LR scores for pre-industrial and present-day conditions. While the Reichler and Kim -diagnostics are certainly helpful to quickly get an overview of a models performance in representing the mean state climate, they do little to elucidate processes responsible for better or worse model performance.
 An interesting and challenging issue in MPI-ESM is the Tropical precipitation distribution over land versus ocean. The model prefers precipitating in the ocean, whereas observations indicate a stronger preference to precipitate on land. The problem is interesting because it impinges on our understanding of deep convective processes, and it is becoming a major issue with the inclusion of dynamic vegetation in the Earth system model as underestimated precipitation on land leads to biases in the vegetation which may ultimately degrade the representation of the carbon cycle [Collins et al., 2011; Brovkin et al., submitted manuscript, 2012].
 The alternative worlds exhibit significant differences in the distribution of precipitation over Maritime Southeast Asia (Figure 8). Whereas standard ECHAM6 and to an even larger extent World 1 prefer to precipitate over the ocean, World 2 and in particular World 3 do produce precipitation over the larger islands of Sumatra and Borneo, in better agreement with observations. Note that these detailed differences are not visible in the Reichler and Kim  precipitation performance index.
3.3. Tropical Variability
 Tropical inter-annual variability is dominated by the El Niño-Southern Oscillation (ENSO), while intra-seasonal variability is dominated by the Madden and Julian Oscillation (MJO). Here we investigate the influence of the parameter changes on the model representation of these phenomena.
 Earlier versions of our climate model, e.g., ECHAM5/MPIOM, had a too pronounced ENSO variability, while the MPI-ESM-LR is much closer to the observations (Figure 9). ENSO was not significantly impacted by the changes made to convective parameters in World 1 and 3, but the variability was consistently reduced in World 2, in which we doubled the convective cloud conversion rate to rain from 2·10−4 to 4·10−4. This is interesting because the same parameter was also doubled from ECHAM5/MPIOM to MPI-ESM-LR, from 1·10−4 to 2·10−4, in order to increase OLR in closer agreement with the more recent CERES satellite observations. This means that at least part of the improvement in ENSO could have been achieved simply by chance. Other features of ENSO, such as the shape of the frequency spectrum, annual cycle and the skewness were insensitive to the changes made in the alternative worlds. Other studies have shown sensitivity of ENSO characteristics to the representation of atmospheric convection [e.g., Neale et al., 2008; Kim et al., 2011a], although a direct connection to our findings is difficult to establish due to the different model formulations.
 Tropical intraseasonal variability, which is dominated by the MJO, is also poorly represented by most state-of-the-art GCMs. T. Crueger et al. (The Madden-Julian Oscillation in ECHAM6 and the introduction of a MJO metric, submitted to Journal of Climate, 2012) proposed to quantify the MJO in ECHAM by characterizing: 1) The strength of the convective signal and 2) the eastward propagation. The former is derived from a multivariate empirical orthogonal function analysis of 20–100 day band-pass filtered tropospheric winds and outgoing longwave radiation (OLR). OLR is used as an indicator of deep convective clouds and precipitation. The eastward propagation is derived from the relative strength of eastward- to westward propagating waves in this same band-pass filtered data set.
 MJO characteristics of the parallel worlds are shown in Figure 10. All model versions exhibit a weaker convective signal than observed (26%), and they all have less distinct eastward wave propagation than observed (3.5). The coupled model simulations (diamonds) show roughly twice the convective strength of the AMIP simulations, and an enhancement of the eastward propagation (Crueger et al., submitted manuscript, 2012). In both cases, World 3 shows practically the same strength of westward and eastward propagating waves, in contrast to what one observes. The weak eastward propagation is also an issue in World 2. This is likely a consequence of the parameter changes making the deep convection less sensitive to environmental conditions [Kim et al., 2011a]. The two models, World 2 and 3, that perform the least in terms of MJO are the best in representing the distribution of tropical precipitation between land and ocean in Maritime Southeast Asia (Figure 8).
3.4. Climate Sensitivity
 Equilibrium climate sensitivity to a doubling of CO2 is a standard measure of models' sensitivity to external forcing. We have performed simulations of the parallel worlds coupled to a 50 meter deep mixed-layer ocean with pre-industrial levels of CO2 and with doubled CO2 concentration. In the mixed-layer ocean, no currents are explicitly resolved, instead the ocean energy transports are prescribed as inferred from the AMIP simulations. The simulations are run for 50 years, while the presented differences are evaluated over the last 30 years of the simulations.
 The distributions of the temperature change exhibit significant similarities across the model versions (Figure 11). Warming over land is stronger than over the oceans, peaks at the poles, and is stronger in the Northern Hemisphere than in the Southern Hemisphere. In particular, the temperature response in World 2 closely resembles the standard ECHAM6 response. The zonally averaged temperature response (Figure 12) further quantifies the similarities among the model versions, and, World 3 does stand out with 0.5 to 1 K more warming than the others throughout the Tropics and Sub-tropics. In the Arctic all three parallel worlds exhibit slightly more warming than the standard ECHAM6 model.
 A simple analysis of how the parallel worlds approach equilibrium after the instantaneous CO2-doubling can provide some hints as to why the sensitivity is different among the models (Figure 13). Gregory et al.  argue that the extrapolated intercept with the y-axis is a measure of the atmospheric adjusted forcing from the CO2-doubling, while the intersection with the x-axis is a good estimate of the equilibrium climate sensitivity. The slope is therefore related to the total climate system feedback. World 2 and 3 both have adjusted CO2-forcings which are 10–20 percent higher than the standard ECHAM6 model. This is related primarily to a fast reduction of cloudiness as a direct response to the increase in CO2 [Gregory and Webb, 2008], which is more pronounced in World 2 and 3. The slope is steeper in World 1 and 2 than it is in World 3 and the standard ECHAM6 model. Together, this indicates that the higher sensitivity of World 3 relative to the standard model is mainly a consequence of a stronger adjustments to CO2-forcing, less so temperature-dependent feedbacks.
 Parameter tuning is the last step in the climate model development cycle, and invariably involves making sequences of choices that influence the behavior of the model. Some of the behavioral changes are desirable, and even targeted, but others may be a side effect of the tuning. The choices we make naturally depend on our preconceptions, preferences and objectives. We choose to tune our model because the alternatives - to either drift away from the known climate state, or to introduce flux-corrections - are less attractive. Within the foreseeable future climate model tuning will continue to be necessary as the prospects of constraining the relevant unresolved processes with sufficient precision are not good.
 Climate model tuning has developed well beyond just controlling global mean temperature drift. Today, we tune several aspects of the models, including the extratropical wind- and pressure fields, sea-ice volume and to some extent cloud-field properties. By doing so we clearly run the risk of building the models' performance upon compensating errors, and the practice of tuning is partly masking these structural errors. As one continues to evaluate the models, sooner or later these compensating errors will become apparent, but the errors may prove tedious to rectify without jeopardizing other aspects of the model that have been adjusted to them. To aid the longterm development of our model we choose a tuning-strategy with only a small number of parameter changes between different model versions and resolutions, such that it will be easier to identify and understand how the model formulation can be improved.
 Often, we are confronted with trade-offs when tuning a coupled climate model. For example, two of the parallel worlds showed improvements in the precipitation distribution between land and ocean in the Tropics, but they both exhibit weakened modes of tropical variability. This particular trade-off is becoming increasingly difficult now that the model is coupled to interactive vegetation. The problem suggests that underlying structural errors in the model prohibit us from achieving fidelity with both observed aspects at the same time by means of parametric tuning alone. Thereby the practice helps us to identify the processes that could improve the model by the development of new or improved parameterizations, and as such is integral to the overall model development cycle [Jakob, 2010].
 The model tuning process at our institute is artisanal in character, in that both the adjustment of parameters at each tuning iteration and the evaluation of the resulting candidate models are done by hand, as is done at most other modeling centers. It is, however, at least conceptually possible to automate this process and find optimal sets of parameters with respect to certain targets. When considering model biases that appear on long time-scales (months to years), one option is to use the full model and search through parameter-space seeking areas in which errors are minimized [Jackson et al., 2008; Järvinen et al., 2010]. Alternatively, one can use a relatively small number of model runs to build a statistical model, or emulator, of the error as a function of parameter space to obtain parameter sets that minimize model error [Neelin et al., 2010]. On the other hand, many of the parameters we adjust have relatively fast impacts on the model state, making it appealing to look at errors in very short (hours) forecasts starting from realistic initial conditions. If a data assimilation system is available for the model one can use the average change to the model introduced with each new set of observations as a measure of a specific model configuration's disagreement with observations [Rodwell and Palmer, 2007], and further, such a data-assimilation system can be exploited to find parameter sets that minimize short-term errors [Annan et al., 2005; Järvinen et al., 2012]. We are experimenting with several of these strategies, but have yet to find something significantly better or faster than tuning by hand, nor are we aware of any other climate modeling center that operates differently. We suspect that this relates to the difficulty in making unsupervised compromises: Any such objective tuning algorithm requires a subjective choice of a cost function and this involves weighting trade-offs against one another, which is difficult to do ahead of time.
 The climate sensitivity of the worlds we considered spanned only one quarter of the range of the climate sensitivity of the CMIP3 models. World 1 shows a small decrease, contrary to findings by Klocke et al. , and the increase in climate sensitivity in World 3 is only modest compared to the very high climate sensitivities found by Stainforth et al. . Obviously, a perturbed physics ensemble with only four members is not going to reveal the full range of parameter-dependent uncertainty spanned in more systematic studies, and in particular certain combinations of parameters may yield larger impacts than found here. It is possible, however, that perturbed physics studies, which out of necessity are performed on low-resolution climate models, could be extra sensitive to the changes of parameters. Tests with ECHAM do confirm that the lack of an increased climate sensitivity in World 1 was due to the increase from 19 to 47 vertical levels, relative to that used by Klocke et al. . This could indicate that complex interactions between the resolved and the parameterized model components can introduce erratic behavior at low resolutions.
 One of the few tests we can expose climate models to, is whether they are able to represent the observed temperature record from the dawn of industrialization until present. Models are surprisingly skillful in this respect [Räisänen, 2007], considering the large range in climate sensitivities among models - an ensemble behavior that has been attributed to a compensation with 20th century anthropogenic forcing [Kiehl, 2007]: Models that have a high climate sensitivity tend to have a weak total anthropogenic forcing, and vice-versa. A large part of the variability in inter-model spread in 20th century forcing was further found to originate in different aerosol forcings. It seems unlikely that the anti-correlation between forcing and sensitivity simply happened by chance. Rational explanations are that 1) either modelers somehow changed their climate sensitivities, 2) deliberately chose suitable forcings, or 3) that there exists an intrinsic compensation such that models with strong aerosol forcing also have a high climate sensitivity. Support for the latter is found in studies showing that parametric model tuning can influence the aerosol forcing [Lohmann and Ferrachat, 2010; Golaz et al., 2011]. Understanding this complex is well beyond our scope, but it seems appropriate to linger for a moment at the question of whether we deliberately changed our model to better agree with the 20th century temperature record.
 The MPI-ESM was not tuned to better fit the 20th century. In fact, we only had the capability to run the full 20th Century simulation according to the CMIP5-protocol after the point in time when the model was frozen. Yet, we were in the fortunate situation that the MPI-ESM-LR performed acceptably in this respect, and we did have good reasons to believe this would be the case in advance because the predecessor was capable of doing so. During the development of MPI-ESM-LR we worked under the perception that two of our tuning parameters had an influence on the climate sensitivity, namely the convective cloud entrainment rate and the convective cloud mass flux above the level of non-buoyancy, so we decided to minimize changes relative to the previous model. The results presented here show that this perception was not correct as these parameters had only small impacts on the climate sensitivity of our model.
 Climate models ability to simulate the 20th century temperature increase with fidelity has become something of a show-stopper as a model unable to reproduce the 20th century would probably not see publication, and as such it has effectively lost its purpose as a model quality measure. Most other observational datasets sooner or later meet the same destiny, at least beyond the first time they are applied for model evaluation. That is not to say that climate models can be readily adapted to fit any dataset, but once aware of the data we will compare with model output and invariably make decisions in the model development on the basis of the results. Rather, our confidence in the results provided by climate models is gained through the development of a fundamental physical understanding of the basic processes that create climate change. More than a century ago it was first realized that increasing the atmospheric CO2 concentration leads to surface warming [Arrhenius, 1896], and today the underlying physics and feedback mechanisms are reasonably understood (while quantitative uncertainty in climate sensitivity is still large). Coupled climate models are just one of the tools applied in gaining this understanding [Oreskes et al., 1994; Bony et al., 2012].
 In this paper we have attempted to illustrate the tuning process, as it is being done currently at our institute. Our hope is to thereby help de-mystify the practice, and to demonstrate what can and cannot be achieved. The impacts of the alternative tunings presented were smaller than we thought they would be in advance of this study, which in many ways is reassuring. We must emphasize that our paper presents only a small glimpse at the actual development and evaluation involved in preparing a comprehensive coupled climate model - a process that continues to evolve as new datasets emerge, model parameterizations improve, additional computational resources become available, as our interests, perceptions and objectives shift, and as we learn more about our model and the climate system itself.
 We are grateful for constructive comments and suggestions from Jin-Song von Storch, Marc Salzmann, Frida Bender, Ákos Horváth, Christian Jakob and an anonymous reviewer. We acknowledge the modeling groups, the Program for Climate Model Diagnosis and Intercomparison (PCMDI) and the WCRP's Working Group on Coupled Modelling (WGCM) for making available the CMIP3 and CMIP5 multi-model datasets. The research leading to these results has received funding from the European Union, Seventh Framework Programme (FP7/2007–2013) under grant agreement 244067. This work was supported by the Max Planck Gesellschaft (MPG), and computational resources were provided by Deutsche Klima Rechen Zentrum (DKRZ).