Single-column models (SCM) are useful test beds for investigating the parameterization schemes of numerical weather prediction and climate models. The usefulness of SCM simulations are limited, however, by the accuracy of the best estimate large-scale observations prescribed. Errors estimating the observations will result in uncertainty in modeled simulations. One method to address the modeled uncertainty is to simulate an ensemble where the ensemble members span observational uncertainty. This study first derives an ensemble of large-scale data for the Tropical Warm Pool International Cloud Experiment (TWP-ICE) based on an estimate of a possible source of error in the best estimate product. These data are then used to carry out simulations with 11 SCM and two cloud-resolving models (CRM). Best estimate simulations are also performed. All models show that moisture-related variables are close to observations and there are limited differences between the best estimate and ensemble mean values. The models, however, show different sensitivities to changes in the forcing particularly when weakly forced. The ensemble simulations highlight important differences in the surface evaporation term of the moisture budget between the SCM and CRM. Differences are also apparent between the models in the ensemble mean vertical structure of cloud variables, while for each model, cloud properties are relatively insensitive to forcing. The ensemble is further used to investigate cloud variables and precipitation and identifies differences between CRM and SCM particularly for relationships involving ice. This study highlights the additional analysis that can be performed using ensemble simulations and hence enables a more complete model investigation compared to using the more traditional single best estimate simulation only.
 The Tropical Warm Pool International Cloud Experiment (TWP-ICE) took place around Darwin from 20 January to 13 February 2006 [May et al., 2008]. The data collected during the experiment provides an opportunity to investigate several different states of tropical convection. The experiment collected sufficient information to derive both the large-scale heat and momentum and moisture budgets [Xie et al., 2004] as well as detailed information on the state of the smaller scale convection and associated clouds. Such data sets are commonly used in the modeling community to carry out process-oriented studies in particular applying cloud-resolving models (CRM) and single-column models (SCM). One of the primary motivations for TWP-ICE was to enable the improvement of global climate models (GCM), which are known to be deficient in the representation of cloud and rainfall particularly associated with tropical convection. The international research community has conducted a suite of multimodel studies for TWP-ICE. A hierarchy of experiments enables the investigation of model errors as discussed in J. Petch et al. (Evaluation of intercomparisons of four different types of model simulating TWP-ICE, submitted to Quarterly Journal of the Royal Meteorological Society, 2012) and includes GCM [Lin et al., 2012] and Limited Area Models [Zhu et al., 2012] forced with European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis as well as a CRM study [Fridlind et al., 2012] performing simulations driven by a single “best estimate” large-scale budget data set [Xie et al., 2010]. This paper reports on the SCM component of the overall modeling strategy. One innovation applied here will be the use of an ensemble of SCM simulations to elucidate uncertainties in the estimation of model errors and to explore model sensitivities to changes in the data set driving the model simulations.
 The investigation of model shortcomings through SCMs is a well-used method in the model development research community. Model development studies, which include a SCM component, have been instigated by the Global Energy and Water-Cycle Experiment (GEWEX) Cloud System Study (GCSS) [Randall et al., 2003] in conjunction with the U.S. Department of Energy Atmospheric System Research (ASR) to investigate a wide range of test cases including deep convection over the tropical ocean using data from the Tropical Ocean Global Atmosphere (TOGA) Coupled Ocean-Atmosphere Response Experiment (COARE) [Webster and Lukas, 1992] intensive observation period [e.g., Woolnough et al., 2010; Bechtold et al., 2000] and convection over land exploiting extensive observations [e.g., Grabowski et al., 2006; Xie et al., 2005, 2002; Ghan et al., 2000]. Investigation of the specific problem of the diurnal cycle was conducted by the European Cloud Systems (EUROCS) project and discussed in Guichard et al. . These studies focussed on a limited number of model simulations forced by a single data set, from hereon referred to as the “best estimate” forcing. While best estimates of the large-scale atmosphere are usually derived to depict the most probable state of the large-scale atmosphere, they do contain errors of usually unknown magnitude. These errors complicate the interpretation of the results of SCM simulations, as the discrepancies between the model-simulated fields and observations may be attributed to two sources, from prescribing an incorrect large-scale state or due to errors in model processes. By using a single-model realization of the large-scale state, it is impossible to separate these two error sources.
 Ensemble techniques are commonly used in numerical weather prediction (NWP) and climate models to investigate model sensitivities and to determine uncertainty. These ensembles may include perturbed initial conditions or varying model parameters within a limited range. Multimodel ensembles have also been used to provide an estimate of the range of simulations. A limited number of studies also derived ensemble techniques for use in SCM studies. Hack and Pedretti  added random perturbations to the initial conditions of their ensemble simulations and found considerable variations in simulated fields. Similar results were found when modifying the prescribed vertical motion field in a similar manner. Given the bifurcations discussed in Hack and Pedretti , Hume and Jakob [2005, 2007] and Ball and Plant  determined that an ensemble technique was appropriate for SCM. Hume and Jakob  found that after about 18 h of simulation, results were increasingly sensitive to the prescribed forcing rather than differences in the initial conditions. For this reason, this TWP-ICE study uses an ensemble of large-scale forcing.
 The goal of this study is to apply an ensemble SCM technique to the TWP-ICE experiment and to highlight additional opportunities for model evaluation that such a technique may provide. The technique is applied to a wide range of SCMs as well as a small number of CRMs, enabling the investigation of a range of model behaviors. The results from the ensemble simulation will be compared to those of single “best estimate” simulations. It will be shown that a particularly interesting aspect of the use of the ensemble technique in this context is the possibility to study model sensitivities with changing forcing data set. It is shown that the different models exhibit distinctly different ensemble behavior that is not apparent when comparing simulations with a single forcing data set. Section 2 summarizes the experimental design including the methodology used in the derivation of the ensemble large-scale forcing, the case specification, and a description of the models. The main results of the study are discussed in section 3 followed by a summary and the main conclusions in section 4.
2 Experimental Design
 The experiments conducted here use both a best estimate forcing and an ensemble of forcing data sets. The best estimate data set used is that derived by Xie et al.  and is identical to that used in Fridlind et al. . Using an ensemble approach enables a better understanding of model accuracy and model sensitivity to be calculated. As this study includes a number of different models, these characteristics are determined dependent on model. This section will detail the design of the study including the ensemble forcing design, the case specification, and the participating models.
2.1 Ensemble Design
 A number of techniques exist to derive budgets from observational data collected in field campaigns. Here, the variational analysis technique of Zhang and Lin  is used in the analysis of TWP-ICE observations. This technique provides an estimate of area-averaged atmospheric and surface conditions using a combination of surface observations, vertical profiles of the atmosphere, satellite observations, and numerical model data. The variational analysis process minimizes a cost function for the heat, moisture, and momentum budgets using constraints of top of atmosphere and surface energy and moisture.
 One of the constraints used in the variational analysis method is the domain-average surface rainfall. In the case of the TWP-ICE experiment, this domain-mean surface rainfall is derived from radar data. Compared to the use of rain gauge observations, this improves the spatial representativeness of the estimate, but this comes at the expense of accuracy of the local rainfall estimates as radar measurements need to be converted to rainfall. It has been shown [Zhang et al., 2001] that the surface rainfall has a large effect on the derived forcing data set; for example, the analyzed vertical velocity is very sensitive to rainfall. Furthermore, the derivation of surface rainfall from radar data is also highly complex and liable to large errors [Joss and Waldvogel, 1990]. These errors will have a large effect on the derived forcing data set.
 One method to address uncertainty in large-scale forcing data is to derive an ensemble of forcing data. Only a short summary of the method to derive such an ensemble is given here, with more details provided in the Appendix. The method is principally based on estimates of errors in the rainfall estimates that are a key input to the variational budget analysis. A comparison of radar-derived and rain gauge data is carried out to provide an estimate of the error in the radar estimates of domain-average rainfall. From these error estimates, 100 equally likely alternative domain-mean rainfall time series are calculated. The 100 rainfall time series are then used as inputs to the variational analysis to derive 100 alternative versions of the large-scale state using the same variational technique as is used to derive the best estimate large-scale state. These 100 large-scale states constitute the forcing ensemble used in this study.
 When deriving the large-scale state using these alternate rainfall time series, all other observations have the same values as the best estimate, for example, temperature, moisture, and horizontal wind fields. Given that the boundary values of temperature and moisture are identical between all realizations, the horizontal advection terms of temperature and moisture differ very little. The variational analysis process generally equates larger values of rainfall with increased low-level convergence and upper level divergence and therefore generally larger values of vertical velocity. The structure of the derived vertical velocity, however, is also dependent on other budget terms so that vertical velocity does not monotonically increase with rainfall.
 Figure 1 shows the vertical velocity profile averaged over both the active and suppressed monsoon for each ensemble member as well as key percentiles of the ensemble. Stronger vertical motion is derived from time series with larger rainfall. In the active monsoon, there is always strong upward vertical motion, although the ensemble members with weaker rainfall have weaker vertical motion. During the suppressed monsoon, the ensemble members with strong rainfall have upward vertical motion at all levels. The ensemble members with weaker rainfall have upward motion at lower levels (below 650 hPa) but downward motion above. In addition to the ensemble members, Figure 1 shows the standard best estimate for vertical velocity. As is evident, the best estimate results are close, but not identical, to the 50th percentile of the ensemble forcing. While there is a large spread in omega, it is worth noting that 50% of the ensemble members lie in the limited range between the 25th and 75th percentile lines. While each ensemble member is equally likely, most cluster around the 50th ensemble member and the best estimate, and the most extreme omega values are rare. These differences in omega imply changes in low-level convergence and upper level divergence through the continuity equation and will have an effect on convection. These 100 large-scale “forcing” data sets are then used as input to the model simulations discussed below.
2.2 Case Description
 The TWP-ICE experiment experienced a range of atmospheric conditions. At the start of the experiment, the region experienced monsoon conditions. Between 23 and 24 January, a strong mesoscale convective system (MCS) passed through the domain followed by relatively suppressed conditions. There were then clear conditions from 3 to 5 February with little rain followed by monsoon break conditions to the end of the field campaign. Full details of the meteorological conditions can be found in May et al. . In this study, the focus is on the active period defined as 20 00Z–25 12Z Jan and the suppressed period defined as 28 00Z Jan–2 12Z Feb. The conditions during the clear and break periods are dominated by a strong diurnal cycle, which is driven by the land-sea contrast in the experiment domain. As SCMs cannot usually represent such contrasts in a single grid box, the later part of the experiment is excluded from the simulations presented here.
 In order to investigate the performance of the ensemble technique proposed here in different meteorological conditions, the study applies two sets of large-scale forcing data. The first is a best estimate simulation forced using the standard data set [Xie et al., 2010]. These simulations can be directly compared to the CRM results [Fridlind et al., 2012], and the best estimate simulations also form the basis of discussion in J. Petch et al. (submitted manuscript, 2012). In this study, the best estimate simulations will be used to form a SCM multimodel ensemble. In addition to the best estimate simulations, all models were run using the 100-member ensemble of forcing data derived above. It was found that some models showed numerical instabilities for the strongest forcing data sets (i.e., those derived from the largest rainfall) when using their standard time-stepping. As a consequence, the 10 strongest forcing data sets and, for reasons of symmetry, the 10 weakest ones are excluded from further analysis, reducing the ensemble size to 80.
 The aim when defining the model specification is to impinge as little as possible on the inherent characteristics of the individual models, and modelers are encouraged to use their preferred configurations; however, the following requirements are made for all simulations:
 The TWP-ICE domain has mixed surface types making the choice of surface type unclear. All simulations assume an ocean surface consistent with Fridlind et al. . Fixed time-invariant SST = 29°C is used. Interactive surface fluxes are required to be calculated in the boundary layer scheme.
 Simulations are initialized with observed temperature and moisture profiles at 0300Z 19 January 2006.
 An observed ozone profile is used where possible, but the McClatchey ozone profile [McClatchey et al., 1972] is used above the maximum height of observations (40 mbar).
 Full interactive radiation is used with a diurnal cycle for a domain centered on the Atmospheric Radiation Measurement (ARM) site (12.425°S, 130.891°E).
 Mean horizontal winds are relaxed to observed profiles with a 2 h time scale. There is no nudging of the temperature and moisture fields which are left free to respond to the forcing.
 Horizontal advective tendencies for temperature and moisture are prescribed, but the vertical terms are calculated by the models. Sensitivity studies showed a warm temperature bias above the tropopause when prescribing a total forcing as the model cannot freely evolve vertical advection associated with this warming and reduce the temperature bias. This method differs from Fridlind et al.  where temperature and moisture were nudged towards observed profiles to avoid such temperature biases.
 In this section, a brief description of all models used in this study will be given. Table 1 gives a summary of the models with further details given below. The study also includes two CRM which also simulate the ensemble. The CRM provide an important reference for the SCM and link to the CRM study [Fridlind et al., 2012].
aIncludes the acronym used in this paper, the full model name, contributing author(s), and the main reference for the model. Further model details are given in the text and the references therein. Note that there are two cloud-resolving models as part of this study.
 The UK Met Office SCM [Davies et al., 2005] contains parameterizations for radiation [Edwards and Slingo, 1996], layer-cloud microphysics [Wilson and Forbes, 2004; Wilson and Ballard, 1999], boundary layer processes [Lock et al., 2000], and convection; see also Martin et al. . Results were submitted for both the default UM convection scheme [Gregory and Rowntree, 1990; Martin et al., 2006; Derbyshire et al., 2011] (UM-GR) and the Plant and Craig  stochastic spectral mass-flux scheme (UM-PC). In the default scheme, convection is triggered by instability of surface parcels at the lifting condensation level (LCL); a CAPE closure is used for deep convection, and the closure for shallow convection is based on Grant . In the Plant-Craig scheme, convection is triggered by constructing potential updraft source layers and evaluating their buoyancy at the LCL; a CAPE closure is used. The stochastic variability of the Plant-Craig scheme depends upon the column size—an area of (50 km)2 was used here.
 The single-column model of the NCAR CAM3 (SCAM) contains the radiation scheme as described in Collins et al. . The treatment of cloud condensation and microphysics in CAM3 [Boville et al., 2006] is based on Rasch and Kristjánsson  as updated by Zhang et al.  with separate prognostic equations for the liquid and ice-phase condensate. The boundary layer scheme is based on Holtslag and Boville  and Boville et al. . CAM3 includes the convection scheme of Zhang and McFarlane  with CAPE closure. CAM3-Liu (SCAML) [Wang et al., 2009] differs from SCAM with modification for cloud microphysics by introducing a double-moment cloud microphysics [Liu et al., 2007], explicit treatment of ice nucleation [Liu and Penner, 2005], and water vapor deposition on ice crystals and Bergeron-Findeisen process in pure ice and mixed-phase clouds. SCAMR differs most fundamentally from SCAMS as the deep convection parameterization is replaced by the revised Zhang and McFarlane  scheme proposed by Zhang . The new convection scheme uses CAPE changes due to large-scale forcing (e.g., large-scale advection, radiative cooling) in the free troposphere, instead of CAPE itself, for closure.
 In the NCEP GFS model, the longwave radiation scheme follows Fels and Schwarzkopf  and Schwarzkopf and Fels . The shortwave radiation formulation uses multiband techniques [Slingo, 1989; Chou et al., 1998; Kiehl et al., 1998]. The cloud condensate is prognosed from a single-moment microphysics scheme [Zhao and Carr, 1997]. The boundary layer parameterization uses a nonlocal scheme [Hong and Pan, 1996]. Penetrative convection scheme [Pan and Wu, 1995] is simplified from Arakawa and Schubert , with a quasi-equilibrium assumption as a closure. Convection is trigged when a cloud work function exceeds a threshold. Shallow convection is parameterized as an extension of the vertical diffusion scheme [Tiedtke, 1983].
 The GFDL AM2 uses the shortwave radiation algorithm of Freidenreich and Ramaswamy , and the longwave radiation follows Schwarzkopf and Ramaswamy . It uses Slingo  and Held et al.  for liquid cloud radiative properties and Fu and Liou  for ice clouds. The microphysics scheme uses Rotstayn  with cloud fraction prognosed following Tiedtke . The microphysics used for convective clouds is rather crude with prescribed precipitation efficiencies for shallow and deep convections. Boundary layer scheme follows Lock et al. . GFDL uses the relaxed Arakawa-Schubert scheme [Moorthi and Suarez, 1992] with a CAPE closure for shallow and deep convection.
 The GISS SCM used here is a developmental update of the Schmidt et al.  model. Radiation uses explicit multiple scattering calculations and the k-distribution approach to absorption. Large-scale clouds are based on the prognostic cloud water parameterization of Del Genio et al. , including all relevant microphysical processes, detrainment, and cloud top entrainment. Convective microphysics follows Del Genio et al. , which interactively partitions condensate into precipitating, detrained, and vertically advected components. The boundary layer uses dry conserved variables and includes local (diffusive) and counter-gradient flux terms. Moist convection is parameterized using a mass-flux scheme with convection triggered when a lifted parcel becomes buoyant. The mass flux is that required to produce neutral buoyancy at cloud base, with updraft speeds and entrainment rates based on Gregory .
 The CLUBB model, in these TWP-ICE simulations, is used in conjunction with the BUGSrad radiative transfer scheme [Stephens et al., 2001] and a single microphysics scheme [Morrison et al., 2009] for all clouds. Although in the prior literature CLUBB was tested only for boundary layer cloud cases [Golaz et al., 2002; Larson and Golaz, 2005; Larson et al., 2012], here CLUBB is used to simulate both deep and shallow clouds with a single, unified equation set. Unlike Larson et al. , here CLUBB is run as a single-column model and handles all cloud types without the use of a cloud-resolving model or any other host model. CLUBB prognoses various higher-order moments and achieves closure by use of a single multivariate subgrid PDF of velocity, moisture, and temperature. CLUBB has no explicit convective trigger; rather, the turbulence and thermodynamic variability generated in shallow convection are intended to evolve into deep convection when and where the large-scale forcings are appropriate.
 The single-column model JMA1 contains the parameterizations of the default Global Spectral Model [JMA, 2007; Nakagawa, 2009]. The radiation scheme has two-stream with delta-Eddington approximation for shortwave and table look-up and k-distribution methods for longwave. Cloud condensation and microphysics are based on Smith  and Sundqvist et al. . The boundary layer scheme is the level 2 closure scheme of Mellor and Yamada . The convection scheme is a multiplume type with cloud work function closure based on Arakawa and Schubert , two types (for shallow and deep convection) of prognostic equations of the upward mass-flux [Randall and Pan, 1993] and triggering functions [Xie and Zhang, 2000]. JMA2 is the same as JMA1, except for using modified convection and cloud schemes (T. Komori and K. Yoshimoto, Evaluation from a perspective of spin-down problem: Moistening effect of convective parameterization, submitted to CAS/JSC WGNE Research Activities in Atmospheric and Oceanic Modeling, 2012).
 There are two CRM in the study which are briefly described here. The UKMO Large Eddy simulation model (LEM) uses the shortwave and longwave radiation scheme of Edwards and Slingo . The LEM employs a three-phase microphysics scheme, which is described in Gray et al. , and the microphysical configuration is the same as the UKMO-2A setup described in Fridlind et al. . The subgrid mixing scheme is a modified first-order Smagorinksky-Lilly scheme, which is described in MacVean and Mason .
 The model used in the System for Atmospheric Modeling (SAM) is described by Khairoutdinov and Randall  and uses the BUGSrad radiation scheme described in Stephens et al. . Single-moment microphysics were used as outlined in Khairoutdinov and Randall . The subgrid mixing scheme is a 1.5-order closure model [Khairoutdinov and Kogan, 1999]. The SAM model simulates nine ensemble members equally spaced in the range 10–90.
3.1 Simulations of Humidity and Precipitation
 This section gives an overview of both the temporal evolution and the vertical structure of the simulation of several moisture-related variables in the various models. Particular focus is given to comparing moisture-related variables as large errors can arise in models potentially due to the dependence of moisture on error-prone parameterizations. The convective component of total surface precipitation is discussed to highlight the different roles of model parameterization between the active and suppressed periods. Model accuracy will be discussed by comparison to observations for each model. The ensemble is then used to investigate model sensitivity in terms in the sources and sinks in the moisture budget. The best estimate is contrasted with the ensemble mean to directly determine how using an average of many simulations might affect results compared to a single simulation.
3.1.1 Overall Simulation Behavior
 Figure 2 shows time series of surface precipitation for the active and suppressed periods. Model ensemble means are shown as colored lines with individual ensemble members from all model simulations overlaid in gray. In this figure, all UM-type, SCAM-type, and JMA-type models are averaged together as they are very similar. Observations are shown as a heavy black line. This plot allows broad interpretation of the characteristics of each model while capturing the spread of the ensemble. Figure 2 shows that all models have a similar precipitation during the active period with moderate precipitation before the passage of the MCS on 23–24 January. All models have similar heavy rain associated with the MCS. The ensemble is spread around this mean with the largest spread occurring during the MCS. Modeled precipitation is close to observations which may be anticipated, as in strongly forced conditions precipitation will be predominantly driven by forcing in all ensemble members [Xie et al., 2005; Xu et al., 2002; Woolnough et al., 2010].
 Period-mean precipitation during the suppressed period is lower than during the active period. It is evident that the relative differences in the ensemble mean time evolution between models as well as the differences from observations are larger than those in the active period. This might be expected as the forcing is weaker and as a consequence has less of an influence on the model solutions. In weakly forced conditions, it is expected that the details of the parameterizations in the various models exert a stronger influence, which explains the larger differences in the suppressed period. The ensemble spread is rather uniform and does not increase substantially with rainfall, which remains light throughout the period. The CRM behave similarly to the SCM. In the active period, solutions from the two model types track each other closely, again highlighting that precipitation is constrained by the forcing in that period. Just as for the SCM, the differences between CRMs as well as to observations increase (in a relative sense) in the suppressed period. The CRM results in the active period strongly resemble the results of the larger CRM comparison [Fridlind et al., 2012], indicating that the CRMs shown here provide a representative sample for this family of models.
 Figure 3 provides a comparison of the multimodel best estimate ensemble and individual model ensembles for the time-mean surface precipitation averaged over the active and suppressed periods for all simulations used in this study. Each model is included as a box-whisker plot constructed from the time-averaged precipitation for each ensemble member. Observations are also included. It can clearly be seen that the ensemble SCM and CRM encompass a wide range of surface precipitation values. The models capture the spread seen in the observations. This is due to strong coupling between the forcing, which is primarily through vertical velocity, and rainfall.
 The multimodel ensemble has a limited spread of surface precipitation as all models are simulating the same forcing. Figure 3 provides a useful check that the multimodel ensemble has limited spread compared to the SCM and CRM simulations. This result supports findings of Hume and Jakob  that largest spread in an SCM ensemble will be found by varying the forcing (the boundary conditions). Figure 3 also shows the ensemble mean (small asterisk) and best estimate mean (large asterisk) precipitation for the observations and models. For most models, the magnitude of the best estimate observed precipitation is very close to the 50th percentile (median) precipitation with the ensemble mean larger. This is due to the ensemble having a distribution which is skewed towards high values of precipitation leading to larger means than medians.
 Figure 4 shows time-height cross sections of the observed, SCM-mean and CRM-mean modeled relative humidity. Relative humidity provides a useful perspective on the model simulations, since unlike precipitation, which is primarily driven by forcing, relative humidity is less constrained by the forcing and more affected by model physics [Emanuel and Zivkovic-Rothman, 1999]. Given the model setup (section 2.2), the models have freedom to develop certain moisture source/sink terms such as moisture convergence and surface evaporation. The ensemble sensitivity to these terms will be addressed in section 3.1.3. Relative humidity with respect to water has been calculated using Teten's formula [Lowe, 1977, equation 6] for each individual simulation from values of temperature, water vapor, and pressure to ensure consistency across models. The modeled data shown in Figure 4 is an average over all models and all ensemble members used. Detailed investigation shows that relative humidity differences are primarily caused by differences in moisture as temperature varies little across the simulations and is close to observations.
 Observations show that the atmosphere has high relative humidity through a deep layer during the active period, but the models generally underestimate humidity particularly at low levels. During the suppressed period, observations show lower humidity above 800 hPa but large values in the boundary layer. All models capture the reduction in relative humidity caused by drying on the transition to the suppressed period above 700 hPa, although the SCM overestimate the reduction in humidity. Both SCM and CRM persist with low values of humidity in the boundary layer compared to observations.
 While Figure 4 shows the evolution of the mean state, the ensemble simulations also allow investigation of model sensitivity. Figure 5 shows time series of 500 mbar relative humidity for all ensemble members for each model compared to their best estimate simulations, ensemble mean, and observations. Relative humidity at 500 mbar is chosen, as accurate representation of moisture in midlevels is important if models are to correctly represent cloud. All models have high 500 mbar relative humidity during the active period consistent with the observations, but most SCM tend to have lower relative humidity than the CRM. The JMA and GISS models have particularly low relative humidity which is about 10% and 15% lower than the observations, respectively. The CLUBB and NCEP models have slightly larger relative humidity compared to the observations. All models have very limited spread during the active period.
 Observations show that during the transition to the suppressed period, humidity reduces to around 60% after the passage of the MCS. Relative humidity increases slightly before it reduces again from 70% to 30% between days 27 and 31 (27–31 January). There is a big difference between the responses of the SCM and CRM during this period. The CRM capture the transition to the suppressed period reasonably well with relative humidity 10% too low but its temporal evolution well captured. SCM generally reduce relative humidity too much in the transition period with mean values after the transition ranging from 40% (UM) to 10% (JMA). An exception to this is the CLUBB model which does not excessively reduce relative humidity during the transition and is then too moist during the suppressed period.
 The CRM show limited spread during the active period and the passage of the MCS. The spread in both model types is largest during the suppressed period. The SCM show larger but limited spread in the active period and in the transition associated with the MCS. Just like the CRM, they show increased spread during the suppressed period. This suggests a hypothesis that the simulation of midlevel relative humidity may be more sensitive to changes in the forcing when the forcing is weak. Furthermore, this sensitivity results in nonlinearity between the ensemble members which is particularly apparent during the suppressed period. For example, around 30 January, the SCAMS model shows that ensemble members with weaker (stronger) forcing have the lowest (highest) relative humidity despite the forcing not being the weakest (strongest) forcing.
 Figure 5 shows that in general the ensemble mean and best estimate simulation results follow each other quite closely so that their differences from observations are similar. On some limited occasions, the ensemble mean is closer to the observations than the best estimate, for example, UM-PC during both the active and suppressed periods and CLUBB and SCAMS during the suppressed period. To further investigate the ensemble mean to best estimate behavior, Figure 6 shows profiles of the difference between the best estimate and the ensemble mean relative humidity for all SCM for the active period. Figure 6 shows that when averaged over this period most models have similar best estimate and ensemble mean relative humidity. However, there are some important exceptions. For example, the UM-PC has larger ensemble mean relative humidity than its best estimate throughout the depth of the troposphere. This larger relative humidity in the ensemble mean represents an improvement in the model simulations by bringing the values closer to observations. As UM-PC is the only SCM to include a stochastic parameterization, this result highlights that ensemble simulations are necessary when using models with stochastic components. The usefulness of the ensemble approach will be investigated further below.
 When comparing the ensemble simulations with observations (Figure 5), it is possible, for some models and periods, to determine whether the errors are due to the prescribed forcing or are models errors. Given that the observed forcing spans the range of possible observations, none of the JMA ensemble members closely approximate the observed relative humidity during the active period. Therefore, this model clearly has limitations correctly simulating relative humidity during this period. For many models (including SCAM, NCEP, GISS, and JMA), the ensemble shows that the transition to the suppressed period is likely to be attributable to model error rather than errors in the forcing. The GISS model also consistently underestimates relative humidity during the suppressed period.
3.1.2 Precipitation Partitioning
 An interesting question in the simulation of tropical convection is how the various SCMs partition the precipitation between convection and the resolved scale motion. Furthermore, given the construction of the ensemble used here, it is possible to study how this partitioning changes with forcing strength and meteorological situation. Figure 7 shows the time average convective precipitation fraction (CPF), defined as the ratio of convective precipitation to total precipitation at the surface, against total precipitation for both the active and suppressed periods. Each SCM is shown by a color with different symbols used for the different models. Each point represents a single ensemble member averaged over the period of interest. An increase in total precipitation (x axis) indicates an increase in forcing strength. The multimodel best estimate ensemble is shown as large asterisks.
 Generally, there is a wide spread in the magnitude of CPF between the models ranging from 0.2 to 0.9 in the active period and 0.5 to 1 in the suppressed period. In the active period, the models also show a very diverse behavior with forcing strength, with some showing an increase in CPF (e.g., GISS, UM-GR), some showing a near-constant CPF (e.g., NCEP, SCAM), and some showing a decrease (e.g., UM-PC). The GFDL2 model shows a somewhat erratic behavior. Models of the same type show different behavior depending on the parameterization scheme used (e.g., UM-PC versus UM-GR).
 In the suppressed period, all SCMs have a CPF of greater than 50%. There is a tendency in almost all models for the CPF to increase with increasing forcing although there is much scatter in the relationship. There are two groups of models, with either very high or relatively low CPF. There is some consistency between the periods, with the GISS and UM-PC models showing the lowest CPF in both.
 The rather wide spread in model behavior is likely indicative of large differences in the assumptions made in the different convection treatments on how to partition rainfall between convection and the larger scales. As this will likely have an impact on the vertical distribution of heating and moistening, an important issue for future work is to provide observational constraints for the relationships shown here.
3.1.3 Ensemble Moisture Budget Characteristics
 The ensemble provides an opportunity to investigate the interplay between modeled moisture and the moisture budget terms. In particular, this study permits a comparison between how the models control their moisture budgets. Given that the models are forced by prescribing horizontal advection terms and vertical velocity, they independently develop moisture budget terms such as vertical advection terms and moisture convergence in addition to the moisture contributions from parametrized processes such as convection and surface evaporation. This is an important difference between this study and previous intercomparisons [e.g., Woolnough et al., 2010; Guichard et al., 2004] where the total moisture forcing was prescribed. Furthermore, given that this study also includes both best estimate and ensemble simulations, comparison can be made about the additional model characteristics exposed using an ensemble compared to a single best estimate simulation.
 Figure 8 shows time average precipitable water against various terms in the moisture budget for the active period for all models and ensembles in this study. Very similar results are obtained for the suppressed period (not shown). Figure 8a shows that during the active period, the SCMs tend to divide into models in which lower precipitable water is associated with larger precipitation (GISS and SCAM), models where precipitable water is higher for larger values of precipitation (UM and CLUBB), and those models, including CRM, where precipitation is independent of precipitable water. The GFDL model is somewhat an exception as its relationship shows significant scatter.
 The largest term in the moisture budget is the moisture convergence term which is shown in Figure 8b. In all models, the moisture convergence term shows a similar magnitude and characteristics to precipitation which is not surprising as it is the largest source of moisture for the grid box exceeding surface evaporation by an order of magnitude (see below). Furthermore, Figure 8b shows that the moisture convergence acts as feedback mechanism where SCM with larger values of precipitable water enhance moisture supply and produce more precipitation. Other models, despite the strong forcing, have lower precipitable water and lower moisture convergence. Petch et al. (submitted manuscript, 2012) discusses a likely reason by investigating the method used to force the SCM compared to the method used to force the CRM as used in Fridlind et al. . It was found that given a positive moisture bias, convergence (which occurs during the active period) increases that positive bias, and similarly convergence enhances a negative moisture bias. Models forced by prescribing the total moisture forcing, as used in Fridlind et al. , do not develop these biases. The ensemble results shown in Figure 8b support the findings of Petch et al. (submitted manuscript, 2012). This model response to bias is not, however, apparent when only the best estimate simulations are considered. GISS and SCAM both have a drier atmosphere during the active period compared to the observations and other SCM which result in reduced precipitation compared to those SCM with a moister atmosphere.
 Another important term in the moisture budget is surface evaporation. Figure 8c shows this term for each model and ensemble member as before. Note that the surface evaporation term is an order of magnitude smaller than the moisture convergence contribution. It is evident that there is a fundamentally different relationship between forcing strength and evaporation in the SCMs and the CRMs indicating differences in the physical mechanisms at work in these two classes of models. All SCMs approximate a quasi-linear relationship of evaporation to precipitable water, albeit of varying strength, with larger surface evaporation at lower values of precipitable water and lower surface evaporation when precipitable water is high. This is consistent with the formulation of the SCMs as, given that low level winds and SST are prescribed in all models, evaporation can only change in response to atmospheric moisture. The CRMs on the other hand show a very different response to changes in the forcing. Here, the values of evaporation are independent of precipitable water. This indicates the importance of small-scale wind variability in driving surface evaporation. In the SCMs, this variability is not resolved. Unless it is parametrized, SCM surface fluxes are determined by the mean wind alone. In the CRMs, this wind variability is resolved and hence will enhance the surface fluxes. From the results, it is evident that the SCMs do not deal effectively with the subgrid variability. This result highlights the usefulness of the ensemble approach as this “error” in the SCMs would not have been evident from a set of single best estimate simulations.
 By using an ensemble approach, several interesting conclusions about model performance as well as simulation setup could be drawn. Given that strong precipitation in the models (and in nature) is strongly linked to moisture convergence, this exposes some interesting model behavior. By design of the simulations, moisture convergence is calculated by the models. Consequently, those models that develop a dry bias cannot develop large moisture convergence and do not produce as much precipitation, with the opposite effect occurring in models with a moist bias. The SCMs require a drier atmosphere to develop stronger surface evaporation. In contrast, the CRMs develop evaporation changes independent of atmospheric moisture likely due to the development of subgrid scale wind variability not present in the SCMs.
 This section investigates the simulation of cloud-related variables in the CRMs and SCMs. Initially, the vertical structure of liquid water and ice clouds are discussed in both the active and suppressed monsoon. Following on from this, once again use of the ensemble will be made to investigate relationships between cloud-related variables as the forcing strength changes. This will expose several interesting characteristics of the various model parameterizations.
3.2.1 Profiles of Cloud Properties
 Figure 9 shows vertical profiles of the ensemble mean model cloud fraction for all models during the active (top left) and suppressed (bottom left) period as well as selected examples of the full ensemble from three models for the active (top right) and suppressed (bottom right) periods. Cloud fractions generally reflect the meteorological conditions shown in Figure 4 with cloud throughout the troposphere during the more moist, active period and two cloud layers during the suppressed period which are low cloud between 950-750 hPa, and high ice cloud above 200 hPa.
 During the active period, there are large differences in CRM cloud fraction of around 30% at all levels, and the SCMs mostly fall within the range of the CRMs. This can largely be explained by the definition of cloud fraction, which in the LEM includes both cloud and precipitating hydrometeors, while in the SAM model it only includes cloud water and ice. All SCMs have cloud fraction less than 30% below 600 hPa and more cloud (with the exception of JMA) above. There are large differences between the models with slightly better agreement in lower levels than in the upper troposphere.
 The differences in cloud fraction in the SCMs are also large in the suppressed period. One noticeable feature of the selected full ensembles (right panels) is that the difference of individual ensemble members from their mean tends to be smaller than the differences between models. This indicates that the differences in the simulated cloud structures are dominated by the structural properties of the models, not by the forcing data set, and shows that model representation of cloud is liable to error independent of the meteorological conditions. Best estimate simulations are therefore likely sufficient to expose model differences in this variable. This is investigated in Figure 10, which shows the differences of profiles of cloud cover between the ensemble mean and the best estimate simulation. As for relative humidity, most models show only small differences although with notable exceptions, the UM-PC around 400 hPa and GFDL below 700 hPa.
 Figure 11 shows profiles of ice water content in all models for the active period. Again, the ensemble means for all models are shown in the left panel, while selected full ensembles are shown in the right panel. The suppressed period is omitted from this Figure as the ice cloud during this period is not linked to local convection and is not well simulated. There are large differences between ice water content in both the CRMs and the SCMs during the active period which will impact on the model radiation budgets. Modeled ice water content differs in terms of both magnitude and vertical structure. Differences in the structural properties can again be noted in modeled ice water content with each SCM clustering around its ensemble mean. Difficulty in representing ice microphysics has been noted in all other TWP-ICE intercomparison studies and has been unanimously suggested as a focus for future model development.
 This section has shown that there are substantial differences in the vertical structure of parameterized cloud variables which may be attributable to systematic differences in the representation of clouds between the models. Structure in the cloud variables is clearly identifiable using the ensemble in both the active and suppressed periods. These persistent structures show that the models are not sensitive to changes in the forcing and that for most models best estimate simulations are likely sufficient to expose the mean model behavior in both periods. It is clear from the large differences between them that the CRMs only provide a limited estimate of the truth, especially during the suppressed period, as their representations of clouds are limited themselves [Fridlind et al., 2012].
3.2.2 Ensemble Cloud Characteristics
 While the previous section showed that it is likely that the mean cloud properties of each model can be exposed by a single best estimate simulation, the full ensemble results provide a useful tool to investigate how relationships between variables might change within each model as the forcing varies across ensemble members. Representing the correct relationships between variables is a greater challenge for models than representing means, but it is also a necessary condition for applying the models over a wide range of conditions, such as a full GCM. This subsection will investigate how the ensemble developed here can be used to investigate relationships between different variables. Each ensemble member, experiencing different forcing data, can be considered as a separate test case, albeit spaced in controlled manner from all other ensemble members.
 Figure 12 shows the mean liquid water path (LWP) as a function of the mean surface precipitation averaged over the active (left) and suppressed (right) periods for all models. Each symbol represents an individual ensemble member. While there are generally different relationships between the two periods (note the change in scale between periods in the Figure), the CRMs show that relationship between LWP and precipitation is linear (with a gradient of approximately 250 kg m−3 h in both the active and suppressed period). The CRMs agree very well during the suppressed period but differ at the larger precipitation rates during the active period.
 Most, but not all, SCMs also produce a linear relationship between LWP and surface precipitation. Notable exceptions are the GFDL, SCAM, and JMA models. The relationships in the SCMs differ somewhat between the active and suppressed periods with a tendency for models to have tighter and more linear relationships during the active period. In the suppressed period when precipitation is small, both the UM and NCEP models tend to have precipitation independent of LWP, which itself is at an almost constant value. The GFDL and SCAM models tend to display significant scatter in LWP with only a weak relationship to precipitation. In fact, only the CLUBB, GISS, and JMA models increase LWP with precipitation as the CRMs suggest during the suppressed period. A linear relationship was observed between LWP and precipitation in Fridlind et al. .
 The CRMs tend to lie in the middle of the SCM distribution, suggesting that the SCM ensemble mean may approximate the correct values of LWP, although individual models may differ quite considerably from the CRMs. The UM and NCEP models are biased low at all times, whereas the GISS and one of the JMA models have a LWP that is too large during the suppressed period. Unlike for cloud fraction before, the best estimate simulations do not always fall close to the center of the ensemble (note the large asterisks for GFDL and one JMA model to their associated ensemble during the suppressed period). The ensemble results also expose interesting nonlinearities in some of the models. For instance, there is a discontinuity in LWP in the GFDL around 0.15 kg m−2 during the active period. This possibly relates to the discontinuity in the convective precipitation fraction in Figure 7. While magnitude differences are apparent in the multimodel ensemble, the relationships between LWP and precipitation are only found in the full ensemble showing a potential usefulness of an ensemble technique when identifying model behavior.
 Figure 13a shows the relationship between IWP and precipitation during the active period. It can be seen that similar to LWP, IWP generally has a linear relationship with precipitation. Unlike the relationship of precipitation with LWP, the one with IWP is not consistent between the CRMs. There are very different magnitudes of IWP in the CRMs, and the slope of the relationship to precipitation varies strongly as well. Large differences in the simulation of IWP in CRMs have been identified in other studies [Fridlind et al., 2012]. The existence of those discrepancies makes it difficult to use the CRM results in assessing the SCM behavior. The LEM has approximately double IWP compared to SAM with a gradient of 300 kg m−3 h (LEM) compared to 50 kg m−3 h in SAM. Most SCM have gradients around this range, although in the NCEP model IWP is relatively insensitive to forcing.
 The ensemble enables the comparison not only between models but also of different versions of the same model. For example, SCAMS and SCAMR have very similar IWP, whereas SCAML, using a different microphysics scheme, has twice the IWP of the other SCAM models. There is a more marked difference between the two versions of the UM. UM-PC follows closely the gradient and approximate magnitude of the LEM (which is the UK Met Office's CRM) and which was used in the formulation of the Plant and Craig  stochastic convection parameterization scheme. The UM-GR, on the other hand, is close to the SAM CRM which shows that there is complex interplay between the parameterization schemes. The UM SCM only differ in their convection parameterization, but this has a large effect on the IWP produced. In general, there is a split between models that follow the strong slope of the LEM and those closer to the weaker slope of the SAM. It is not possible, however, to attribute the relationship between precipitation and IWP simply based on the model microphysics scheme.
 Figure 13b shows the relationship between LWP and IWP and shows different aspects of the relationships between the variables in the models. There is a clear spilt between some models that have larger ranges in LWP (e.g., SCAM and JMA models) and others that have larger ranges in IWP (e.g., UM-PC and CLUBB). Fridlind et al.  found that 2-D CRMs have a weaker relationship than 3-D CRMs between IWP and LWP, which is contrary to Figure 13a. However, the 2-D version of the LEM used here was not part of the Fridlind et al.  study, and furthermore, the SAM here used a single-moment microphysics scheme, whereas the SAM in Fridlind et al.  used a double-moment scheme [Morrison et al., 2009] so a direct comparison is not possible.
 Interestingly, considering only the multimodel ensemble (Figure 13b) shows a different relationship between LWP and IWP compared to the relationship shown in the individual ensemble simulations. The ensemble within each model suggests increasing IWP with LWP, whereas the multimodel ensemble would suggest a tendency for IWP to increase with decreasing LWP. This shows the differences and potential limitations of using a multimodel ensemble. Using a multimodel ensemble would suggest the reverse characteristic relationship between variables to that suggested by CRM and SCM each simulating their own ensemble.
4 Summary and Discussion
 This study presents an ensemble of SCM and CRM simulations for the TWP-ICE period. The first purpose of the study was to derive an ensemble of model forcings based on observational uncertainty. This data set was then applied to a variety of models to assess what new information about model behavior and model error might be gleaned from an ensemble approach that could not be attained by a single realization commonly used in CRM and SCM studies. It was found that the overall model behavior in terms of the time evolution of thermodynamic variables or the time-averaged vertical structure of those variables generally changes little between the ensemble mean and a single “best estimate” simulation. However, there were some notable exceptions to that finding. In some model simulations, like those with the UM-PC, ensemble means deviate from the best estimate simulations throughout the troposphere. Given that the ensemble mean forcing is close to that of the best estimate, this indicates nonlinearities in the simulation behavior possibly due to the stochastic component of the model. The ensemble also shows that models have greater sensitivity when weakly forced, and therefore, an ensemble is necessary. Perhaps the main value the ensemble adds to single simulations is the possibility to investigate the changes in model behavior with changes in forcing. This has proved invaluable in highlighting several aspects of model behavior in this study, namely, (i) a distinctly different behavior in the SCMs from that in the CRMs in achieving changes in surface evaporation; (ii) the sensitivity to the particular forcing method applied, (iii) a wide spread in the convective precipitation fraction in models and its sensitivity to forcing strength, and (iv) distinctly different model behavior in the relationships between cloud variables and precipitation.
 Examining the terms of the moisture budget using the ensemble enabled interesting conclusions about model behavior for two important terms; the surface evaporation and the moisture convergence. A clear distinction exists between the CRMs and the SCMs. In the CRMs, evaporation increases for constant atmospheric moisture, whereas the SCMs can only increase evaporation by drying the atmosphere. This suggests a role of subgrid variability likely brought about by cold pools in the CRMs that is not parameterized in SCMs. A representation of cold pool dynamics in SCMs would allow surface evaporation to occur in a moist atmosphere. Studying the moisture convergence term as a function of forcing strength revealed an interesting feedback between model error and the particular forcing approach chosen here. As the models are forced with horizontal moisture advection and vertical motion profiles (and hence profiles of mass convergence and divergence), they develop their own vertical moisture advection and moisture convergence terms. In models that develop a moist/dry bias, this bias is reinforced by an increase/decrease of the moisture convergence into the region. This behavior limitation can easily be deduced using the ensemble approach, while it would go largely unnoticed in single simulations with a number of models.
 The ensemble was also shown to be useful in investigating cloud variables and their relationships. Ensemble vertical profiles generally highlight structural differences between different models in that all ensemble members of a particular model tend to lie closer to its mean than to that of other models, even with large variations in the forcing. Consistent with the results in the accompanying modeling studies for TWP-ICE [Lin et al., 2012; Fridlind et al., 2012; Zhu et al., 2012], large differences are found in the models' simulation of cloud ice, highlighting this area once again as one warranting further study. The ensemble is used to identify relationships between liquid water, cloud ice, and precipitation. CRM simulations, while varying in magnitude, show clear linear relationships between those variables. This behavior is not reproduced in all SCMs, some of which show strongly nonlinear behavior or even jumps. The ensemble also reveals that the ice water path to liquid water path relationships are very different between models, with one group of models showing a very strong increase of IWP with LWP, while in others IWP is almost independent of LWP. This conclusion applies to both CRMs and SCMs. Using the multimodel best estimate ensemble only, the important relationship of increasing ice water path with liquid water path in individual models is reversed.
 This study shows that the introduction of an ensemble to a modeling study provides more information than might be gathered by simulating only simple best estimate forcing. While the method does not replace the standard best estimate approach to single-column modeling, it complements it by (i) providing an easy framework to study model sensitivities and (ii) increasing confidence in detecting model behavior that is likely due to model, rather than forcing, limitations. Future SCM studies should therefore consider adding ensemble simulations in addition to, rather than instead of, the more conventional best estimate method. Despite the additional information provided by the ensemble, it remains difficult to conclusively link model behavior in an SCM to parameterization assumptions, highlighting the need to embed studies like the one presented here into a larger framework of model evaluation.
Appendix A: Derivation of the Large-Scale Forcing Ensemble
 An important part of this study is the use of an ensemble of large-scale forcing data sets. The motivation for doing so is to assess the inherent uncertainty in deriving a single best estimate of the large-scale atmosphere from observations and in its subsequent application to drive model simulations. This appendix describes the construction of the ensemble used in this study, which is based on two steps: (i) estimate errors in the estimate of area-mean rainfall and construct alternative rainfall scenarios and (ii) apply a constrained variational analysis to each of the rainfall scenarios derived in the first step to yield the final ensemble of large-scale atmospheric states.
A1 Deriving an Ensemble of Rainfall Estimates
 The main source area-mean rainfall information in this and other TWP-ICE studies [e.g., Xie et al., 2010] are rainfall estimates from a C-band polarimetric radar located near Darwin [Keenan et al., 1998]. The algorithm used to estimate rainfall from radar variables is that of Bringi and Chandrasekar . While the radar provides excellent spatial coverage to estimate area means, deriving rain rates from radar variables will lead to errors in the rainfall estimates. A first step in the ensemble construction is to estimate these errors. To do so, we use rain gauge observations around Darwin and apply a method very similar to that of Jordan et al. .
 Radar rain rates vary in space and time, and radar errors may vary considerably based on location and timing of rain events. The array of rain gauge data shown in Figure 14 is used as a reference for the radar-derived rainfall data. A grid of 3×3 radar pixels (approximately 9 km2) are averaged and compared to rain gauge measurements over an accumulated period of 180 min where both rain rates are greater than 1 mm. By performing this analysis at many locations over the TWP-ICE domain, it is anticipated that the differing sources of error may be better accounted for.
 Examples of the ratio of radar-derived rainfall data to rain gauge rainfall data are shown in Figure 15 for two rain gauges. Assuming that rain gauge data may be a better estimate of rainfall than radar-derived data, ratios close to 1 suggest small errors in the radar data, with smaller standard deviations showing the clustering of the errors. The statistics in Figure 15 for the observed data show differences in the mean values and standard deviations at the two locations, suggesting that indeed errors have different spatial patterns. As the data tend to cluster about 1, the two observed data sets predominantly agree on the magnitude of rainfall, although the long tails of the error distribution show that on occasions large errors can be identified.
 A log-normal distribution is fitted to the errors shown in Figure 15. The log-normal distribution parameters are estimated and used to construct an ensemble of rain rates at each radar pixel as follows. The distribution of radar to gauge rainfall ratios is divided into 100 percentiles. Then the ratio for each percentile is used to multiply the radar rain values, providing 100 rainfall values (one for each percentile) at each radar pixel. For each radar pixel, the error distribution derived at the nearest rain gauge is used. Figure 14 shows the areas (colored) for which error characteristics are assumed constant in space based on the nearest rain gauge behavior.
 Having derived rainfall error estimates at each radar pixel, which is expressed as 100 values of rainfall from the lowest to the highest, the next task is to estimate the error in the area-mean rainfall. This requires assumptions about the spatial correlation of the individual pixel errors. As our goal is to span the widest range of possibilities, we will assume the worst case scenario of maximum correlation. In other words, we assume that whenever the largest possible error occurs at 1 pixel, the largest error in the same direction occurs at all radar pixels. This is an extremely simple assumption and will maximize the possible error in the area-mean rainfall, consistent with our goal to maximize ensemble spread. Using this assumption, 100 values of area-mean rainfall are derived by simply averaging the pixel-rainfall rates within each percentile, i.e., the first percentile of the area-mean rainfall distribution is simply the average of all first-percentile values at each pixel and so on stepping through all percentiles. Figure 16 shows the 100 cumulative rainfall time series in this way for TWP-ICE. For comparison, the figure includes the best estimate rainfall time series as derived by Xie et al. , which falls close to the 50th percentile as might be anticipated from the method the distribution was constructed. While the error estimates allow for a large range of possible rainfall values, 50% of the distribution falls between the 25th and 75th percentiles of the distribution which has a limited range of rainfall.
A2 Deriving the Large-Scale Atmospheric State
 Each of the 100 rainfall scenarios derived above is used separately in the variational analysis algorithm of Zhang et al.  (all other observations, such as thermodynamic variables, horizontal winds, and radiation terms, are unchanged and are the same for each scenario) to produce 100 separate forcings that are all equally possible given the uncertainty in area-mean rainfall. The higher (lower) percentile corresponds to stronger (weaker) surface precipitation and generally stronger (weaker) vertical motion. The characteristics of the vertical motion for the active and suppressed periods are discussed in the main text.
 Investigations were made into whether the additional variational analysis inputs should be modified in order to be more physically consistent. For example, an estimate of rainfall error has been used to derive alternative rainfall time series, but increased rainfall may, in the simplest terms, also be associated with more deep cloud and therefore reduced top-of-the-atmosphere longwave radiation, which is also an input to the variational analysis. Sensitivity studies where the radiation was varied in conjunction with rainfall had little impact on the resulting large-scale atmosphere. This supports Zhang et al. , who suggested that rainfall provided the largest contribution term in the variational analysis.
 The 100 large-scale data sets so derived are used to provide forcing data for SCM and CRM as described in the main text.
 Davies and Jakob are supported by the Office of Science (BER), U.S. Department of Energy, under grant DE-SC0002731. Many of the other coauthors also participated through support from the U.S. Department of Energy Atmospheric System Research Program. V. Larson and B. Nielsen are grateful for financial support from the United States Department of Energy (grants DE-SC0006927 and DE-SC0008668) and the National Science Foundation (grant AGS-0968640). Support for X. Liu was provided by the U.S. Department of Energy (DOE), Office of Science, Atmospheric System Research (ASR) program. The Pacific Northwest National Laboratory is operated for DOE by Battelle Memorial Institute under contract DE-AC06-76RLO 1830. Dr. Weiguo Wang is partly supported by the National Natural Science Foundation of China under Grant No. 41075039. The contributions of S. Xie to this work were performed under the auspices of the U.S. Department of Energy (DOE), Office of Science, Office of Biological and Environmental Research by Lawrence Livermore National Laboratory under contract No. DE-AC52-07NA27344 and supported by the Atmospheric Radiation Measurement Program of the Office of Science at the DOE.