Comparing observations collected during the TWP-ICE field experiment with model output at varying model grids is nontrivial. First, there are spatial and temporal inconsistency between observations and model. For example, retrievals of liquid and ice contents are generally point measurements with high temporal resolution, while model cloud quantity generally represents an in-cloud or grid-box mean. Second, some model variables may not represent exactly the same quantity as observations or observational retrievals. For example, IWC in some models includes snow, while others not. Finally, the observations used in this study are mostly the area mean (i.e., within the variational analysis polygon in Figure 1). We use the average of the model grid points residing in a box covering the polygon (Figure 1) for most comparisons unless explicitly noted. Model grid points overlapping with the box vary from 2 for the coarsest model to over one hundred grid points for models at ∼20 km resolution. The total area sampled in the models varies from ∼90,000 to 150,000 km2 while the observational area is ∼70,000 km2; these differences are small compared to the variations between the models. Before investigating the model performance during different regimes, we first present the time variation of some key variables for the whole period. The comparison focuses on the period from 19 January to 12 February, 2006.
3.1. Model General Behavior
 Figure 2 shows the evolution of winds and moisture at 700 hPa from the ECMWF analyses and outgoing longwave radiation (OLR) and surface precipitation from the satellites with one example model forecast. The TWP-ICE period is coincident with the activity of a major MJO event. The wet period (19–25 Jan) was an active monsoon period dominated by robust oceanic convection with a mesoscale convective system (MCS) developing into a tropical low informally dubbed “Landphoon John” that moved south inland near 30 Jan at 20°S (Figure 2a). As this tropical low moved south, middle troposphere dry air intruded from the midlatitudes, and deep convection was suppressed over Darwin near 12°S, with cumulus congestus dominating during this dry period (26 Jan–2 Feb). The third period is a short three-day clear period without precipitation near Darwin. The fourth period is a monsoon break period with more localized afternoon thunderstorms accompanying the land-sea breeze and several squall lines crossing Darwin (6–13 Feb) accompanied by a strong high pressure system residing inland to the south. Corresponding to these regimes, OLR and precipitation show concomitant variation with a clear rain shaft following the southward drift of the tropical low (Figure 2b). There were thunderstorms in the afternoon and early evening near the coast area during the break period (Figure 2b). Further inland, a strong diurnal cycle of land surface temperature is obvious in the OLR fields during the break period.
Figure 2. (a) Time-latitude plot of wind vectors and specific humidity (g/kg, shaded) at 700 hPa from the ECMWF operational analysis. (b) Time-latitude plot of OLR (shaded) from the European Union CLoud Archive User Service (CLAUS) with precipitation (mm/h) from TRMM 3B42 overlaid (contours) during the same period. (c) Same as Figure 2a but from the 25-km CAM4 day 2 forecasts. (d) Same as Figure 2c but showing the OLR and precipitation field. The four periods in the TWP-ICE region are also labeled. All data used is every 3 h.
Download figure to PowerPoint
 The time-latitude plots from the 25-km CAM4 (Figures 2c and 2d) indicate that the CAPT approach effectively gives realistic large-scale environments and captures the transition of these regimes well. For example, the strong westerlies followed by the passage of an active MJO phase and the south moving tropical low are well captured. The model also captures the mid-troposphere dry air intrusion near Darwin around 3 Feb. In the break period, large-scale forcing is relatively weak. The model captures the land sea temperature difference and triggers localized convection, though the magnitude is relatively weak and timing is not exactly the same. Other models also capture the overall large-scale environment well (not shown).
 Figure 3 shows the daily mean observed and model surface downward shortwave (SWDSFC), OLR, ice water path (IWP), liquid water path (LWP), water vapor path (WVP), and total precipitation rate (PR), respectively. Overall, the models capture the inter-regime characteristics well, like the less SWDSFC and OLR in the wet regime than the dry and break regime. All models tend to underestimate WVP in the wet and break period. Most models predict the maximum precipitation near 25 Jan, which is one day later than the observed. This maximum precipitation shift was also noted in Boyle and Klein  and was attributed to the operational analysis used. The maximum vertical motion in the ECMWF operational analysis near Darwin region is around one day later than the variational analysis (not shown). Observed OLR increases from ∼150 W m−2 in the wet period to ∼250 W m−2 in the clear period and decreases slightly during the break period. Most models tend to overpredict OLR except HIRAM and AM3, which tend to underpredict OLR, especially during the dry and break periods. Model SWDSFC is generally within 50 W m−2 of the observed and also shows a one day shift of the minimum consistent with the precipitation shift. The two observational LWP retrievals, especially the one including drizzle and rain, show a corresponding variation with PR. For example, LWP is up to 0.4 mm near 23 Jan with a PR of 75 mm day−1. LWP differs by up to two orders of magnitude among the models, with GAMIL and MetUM having LWP up to 1.5 and 0.9 mm, respectively. Observational retrievals of IWP are large (0.1 to 0.15 mm) during the wet and break period with small IWP during the dry period. Though most models capture this inter-regime variation of IWP, IWP also differs by up to two orders of magnitude among the models. Three GFDL models, IFS and MetUM have large IWP in the wet period. Overall, this is consistent with Waliser et al. , who found large variation of IWP and LWP among the models they compared. One source for the substantial discrepancies of cloud simulations is from the cloud scheme, which is designed and implemented differently among the models.
Figure 3. Time series of daily mean surface downward shortwave (SWDSFC), outgoing longwave radiation (OLR), ice water path (IWP), liquid water path (LWP), water vapor path (WVP), and total precipitation rate (PR) from models and observational estimates. Error bar in the IWP panel is the standard deviation of 5 IWP retrievals from CRED. The thick solid and dashed black lines in LWP panel are the two observational retrievals from CRED. The four distinct periods are also labeled. Dashed lines are fine resolution model results.
Download figure to PowerPoint
 Cloud amounts also differ significantly among the models (Figure 4). Note that model cloud fraction represents the fraction of a model grid box occupied by clouds, while observational cloud fraction is derived from the cloud occurrence frequency at a single point [cf. Xie et al., 2008]. For this reason, the comparison here can be only taken qualitatively. Note the model cloud fraction uses the closest grid point value. There are again four distinct regimes in the observed clouds, a wet period with extensive clouds extending from the surface to the upper troposphere, the suppressed and dry period with congestus and an upper level cirrus deck advected from the tropical low in the south, and the break period with more intermittent deep convection with a small coverage of upper level cirrus. We found that resolution only has minor impacts on cloud fields (not shown) and this implies that the model physics (most probably the cloud and convection schemes) dominates model cloud fields. Models generally get the transition from strongly forced to weakly forced large-scale conditions, but only some get the transition from the dry to break conditions. For example, GAMIL, MetUM, and GME cannot depict the congestus associated with the dry air intrusion during the dry period. Compared with AM2, AM3 has more clouds in the upper troposphere (∼50 hPa). JMA, CAM4, and IFS capture the shallow convection (congestus) relatively well during the dry period compared with other models. In the break period, models capture the deep convection well, though some models cannot distinguish cloud fields between the break and dry period. Other fields, such as water vapor, temperature, and winds are also compared (not shown), and there are generally no significant differences among the models. This is further evidence that the cloud and convection schemes are the primary difference makers between simulations.
Figure 4. Time-pressure plot of cloud fraction from (a) MMCR and MPL, (b) GAMIL, (c) AM2, (d) AM3, (e) HIRAM, (f) CAM4, (g) MetUM, (h) GME, (i) JMA, and (j) IFS. Model results are fine resolution forecasts when available.
Download figure to PowerPoint
3.2. Regime Specific Evaluation
 The distinct cloud and precipitation characteristics associated with the three regimes (wet, dry, and break as shown in Figures 3 and 4) provide a good test of the model cloud and convection performances under different environmental conditions. In this section, model simulations in each regime are compared, using available observations as references. Figure 5 shows the profiles of LWC, IWC, cloud fraction (CF), and relative humidity (RH) with respect to water for the three periods, respectively. Since cloud microphysical retrievals are available only in cloud, model in-cloud (grid box mean value divided by cloud fraction) LWC and IWC are compared for consistence. As mentioned before, two LWC retrievals are shown here. One considers drizzle and rain besides cloud droplet, while the other only includes cloud droplet. LWC retrieval is around 0.1–0.3 g kg−1 and decreases quickly to zero above 400 hPa for all the three periods. Though models have similar LWC profiles among the three periods, LWC inter-model differences are large for each period. GAMIL predicts maximum LWC near 600 hPa in the wet and dry period, and CAM4 tends to have large LWC near the surface. MetUM predicts relatively large LWC (up to 0.8 g kg−1) and GME has negligible LWC. Different from observational retrievals, CAM4, MetUM, and GAMIL have substantial LWC above 400 hPa.
Figure 5. Mean profiles of (first column) in-cloud LWC, (second column) IWC, (third column) cloud fraction, and (fourth column) relative humidity with respect to water for the (top) wet, (middle) dry, and (bottom) break period from models and available observational estimates. The thick solid and dashed black lines in the first column are the two observational retrievals from CRED. Error bar in the second column is the standard deviation of 5 IWC retrievals from CRED. Dashed lines are fine resolution model results. Note the different x axis scale for cloud fraction.
Download figure to PowerPoint
 During the wet period, IWC retrievals reach ∼0.1 g kg−1 near 300 hPa with the mean and standard deviation up to 0.5 g kg−1 near 100 hPa. One retrieval, specifically designed for cirrus [Deng and Mace, 2008], derived large IWC at these upper levels. We include this retrieval considering that there are extensive anvils and upper level cirrus during the wet period, which may be significantly underestimated by MMCR due to signal attenuation as discussed later. In the dry period, there are IWC up to 0.07 g kg−1 near 200 and 400 hPa corresponding to the upper level cirrus deck and some cumulus congestus reaching the middle troposphere (cf. Figure 4a), respectively. IWC retrievals in the break period are up to 0.3 g kg−1 with larger standard deviation than those in the wet and dry period. The larger IWC in the break period than in the wet period seems surprising at first sight. We note the continental type convection during the break period sometimes has more intense measured reflectivity than that in the wet period. These high reflectivity values are likely associated with hail and it is possible that the IWC in these very high reflectivity regions is overestimated by the retrievals. May and Ballinger  also note the convection in the break period tends to be more intense with higher reflectivity and higher cell heights. Similar to IWP comparison, IWC among the models differs significantly with HIRAM having the largest IWC. MetUM, IFS, and GFDL models generally have larger IWC than other models, with GME having the smallest IWC. Much smaller LWC and IWC in GME is because convective cloud contribution is excluded. IWC extends below the melting level for MetUM, suggesting that the ice and snowmelting treatment in this model might be different from the instantaneous melting used in other models. Overall, similar to LWP and IWP, LWC and IWC differs by up to two orders of magnitude among the models, consistent with previous studies [Waliser et al., 2009]. Nevertheless, observational retrievals with an uncertainty estimate still provide a useful constraint and guidance for model development.
 In the wet period, most models have top-heavy cloud fraction approaching 40–90% near 150 hPa, while observed CF decreases significantly above 250 hPa. This is probably due to the significant signal attenuation of the MMCR in the presence of deep convective clouds. Satellite images indicate extensive upper level clouds over Darwin region in the wet period and suggest the abrupt CF drop-off above 250 hPa is somewhat unrealistic. JMA has a minimum of CF near 400 hPa, coincident with the model's dryness in the middle-upper troposphere compared with other models and observations. Model RH is generally within 10% of the observed except at the middle and upper-troposphere for JMA and above 200 hPa for AM2 and GME. GAMIL is saturated near 400 hPa due to a ∼3 K cold bias there (Figure 6). There is an obvious observed dipole of cloud fraction corresponding to shallow congestus and the upper level cirrus deck in the dry period. Models tend to overpredict the middle and upper layer clouds by 2–4 times during this period. This suggests that models have difficulty simulating congestus, with convective plumes shooting into the dry layer too frequently. Observed RH decreases quickly from 90% in the boundary layer to ∼60% above 700 hPa in the dry period as a result of the dry air intrusion. RH spread among the models is larger during the dry period, especially in the middle troposphere with an overall moist bias for most models. During the break period, the MMCR and MPL measured the smallest CF with maximum CF of 15% near 300 hPa. Models have lower CF in this period, although AM3 predicts CF up to 40% near 200 hPa. RH in the break period varies with heights as observation, but most models show a dry bias, which is opposite to the dry period.
Figure 6. Mean profiles of (left) temperature bias and (right) water vapor mixing ratio relative bias with respect to the variational analysis for the (top) wet, (middle) dry, and (bottom) break period. The black thick lines denote the bias of the ECMWF analysis relative to the variational analysis. Dashed lines are fine resolution model results.
Download figure to PowerPoint
 RH comparison in Figure 5 indicates the models have different temperature and moisture profiles. Model systematic temperature and moisture biases can be detected by the CAPT approach [Willett et al., 2008; Martin et al., 2010]. Figure 6 shows the temperature and moisture bias of model forecasts relative to the variational analysis over the three periods. Note that RH bias is impacted by both temperature and moisture bias following the Clausius-Clapeyron equation. Some overall features in Figure 6 are summarized here. First, the ECMWF analysis, from which the models are initialized, is drier by 10–20% compared to the variational analysis during the wet and break period. As reasons for this discrepancy, one can cite a different handling of observation errors for the microwave retrievals, different radiosonde bias corrections, different model background, and the fact that the ECMWF analysis also includes information from the larger scales. This dryness in the initial condition may contribute to the dryness in model forecasts. Second, all models unanimously underestimate moisture below 600 hPa in the wet and break period. In contrast, most models (except IFS and CAM4) tend to have a moist bias (up to 80% near 300 hPa for GAMIL and AM3) above ∼700 hPa in the dry period. This is consistent with Figure 4, which shows IFS and CAM4 have good simulation of shallow congestus, but GAMIL and AM3 forecast too much clouds around 300 hPa. IFS and CAM4 use an entrainment rate dependent on the free troposphere humidity [Bechtold et al., 2008; Neale et al., 2008] and increase the sensitivity of convection development to the environmental humidity. As the convection depth is overestimated, moisture is transported upward to a higher layer (not shown) than the observed and results in the moist bias in most models. It appears that the moist bias in the dry period is related to the poor forecasts of cumulus congestus in some models. Temperature bias is more diverse among the models and generally less than 2 K in the troposphere except for HIRAM and GAMIL. HIRAM tends to be cold in the upper troposphere near 200 hPa. GAMIL has a maximum cold bias up to 3 K near 400 hPa. Such temperature and moisture bias are related with the vertical heating and moistening structures discussed later.
 Vertical heating structures are different for convective and stratiform precipitation. They are a good test of model convective parameterizations in terms of their cloud model, i.e., how to distribute the heating and moistening in the vertical [Bechtold et al., 2001]. Following Lin et al. , Figure 7 shows the precipitation-normalized Q1 (Q1 divided by total precipitation), precipitation-normalized convective heating (convective heating divided by convective precipitation) and stratiform heating (not normalized because some models have very small large-scale precipitation), and cumulus mass flux during the three periods. Q1 is from the variational analysis [Xie et al., 2010] and convective and stratiform heating are from Courtney Schumacher (personal communication, 2012). These retrievals are subject to an uncertainty of at least 25% associated with the surface precipitation retrieval from the C-POL radar [Fridlind et al., 2012]. During the wet period, all models capture the top heavy Q1 profile quite well and the differences among the models are small. This is due to the fact that the large-scale forcing in the active monsoon environment is strong and the response to such a strong forcing is well captured by these models. Most of the Q1 is from the convective component, which has a maximum near 600 hPa. All other models maximize between 500 and 700 hPa except JMA, which has the convective heating maximizing near 400 hPa. HIRAM tends to have bottom-heavy convective heating. Observationally retrieved stratiform heating is up to 5 K day−1 and has a distinct structure featuring heating aloft and cooling below. Stratiform heating is more diverse among the models. Similar to observations, IFS, MetUM, AM3, and HIRAM have condensational and depositional heating above and evaporative and melting cooling below. JMA has stratiform cooling aloft near 400 hPa, which effectively compensates for the overestimated convective heating at that height. Convective mass flux profiles are generally similar to convective heating profiles with the heating maximum slightly above the mass flux maximum. However, the magnitude of the convective mass flux differs significantly among the models, with GAMIL having the smallest value (∼5 g m−2 s−1) and MetUM and JMA having larger values (∼50 g m−2 s−1). MetUM and CAM4 have a mass flux maximum near the freezing level and JMA has a shallow mode for the mass flux.
Figure 7. Mean profiles of (first column) total precipitation normalized Q1, (second column) convective precipitation normalized convective heating, (third column) stratiform heating, and (fourth column) convective mass flux for the (top) wet, (middle) dry, and (bottom) break period from models and available observational estimates. Dashed lines are fine resolution model results. Note the different x axis scale for the third and fourth columns.
Download figure to PowerPoint
 In the dry period, deep convection is inhibited and congestus extends to 600 hPa (Figure 4). Observed Q1 indicates a bottom heavy heating structure associated with the shallow convection. Model Q1 profiles are more varied than those in the wet period with IFS, CAM4, and 50-km HIRAM capture the bottom heavy profile relatively well. These models also have small moisture bias above 700 hPa as shown in Figure 6. In contrast, models having deep convective heating and deep convective mass flux also have convective moistening above 700 hPa (not shown), which contributes to the moist bias in these models. This is a good example illustrating the close connection between convective parameterization and model's systematic bias. Compared with the observed Q1, observed convective heating peaks at a higher level (700 hPa). Most models somewhat capture the observed convective heating profile, but JMA still has the convective heating at too high an altitude. Consistent with Derbyshire et al. , models using entrainment rate specification considering the moisture effect (IFS and CAM4) are able to capture the bottom heavy heating profile due to the sensitivity to the middle level humidity. Again, stratiform heating differs significantly among these models. Furthermore, most models overestimate the negligible stratiform heating observed and fail to capture the characteristic stratiform heating profile shape. Convective mass flux possesses a bottom-heavy profile associated with the inhibited convective depth for most models except JMA, which has a deep mode besides a well-defined shallow mode. Consistent with the convective heating profile, CAM4 and IFS have the most bottom-heavy profile of mass flux.
 In the break period, Darwin area is dominated by intense afternoon convection and squall lines forming on sea breeze convergence lines as well as moving off the higher terrain inland (Figure 4). The continental type convection is more intense and generally relatively small spatially than the oceanic convection during the wet period. These convective systems tend to have much smaller stratiform precipitation as indicated by the negligible stratiform heating (Figure 7). Models respond somewhat correctly and simulate the observed total heating structures, which has cooling in the boundary layer and a broad heating in the 300–600 hPa layer. This boundary layer cooling is probably due to rain evaporation in the relatively dry near surface air. 50-km HIRAM has total precipitation rate less than 1 mm day−1 in the break period and Q1 is dominated by the longwave cooling in the troposphere (not shown). This is the reason why its normalized Q1 profile is negative and out of the bound in Figure 7. Similar to the dry period, convective heating is more diverse among the models. This suggests the response of the models to the relatively weak forcing during the dry and break period are more diverse. Stratiform heating becomes small (<1 K day−1) with structures being varied among the models. Compared to the dry period, the break period magnitudes of convective mass flux are decreased by one-half and this is probably due to the less frequent convection occurrence in the break period. Overall, most models somewhat capture the shapes of convective heating associated with different types of convection, especially in the wet period. In contrast, stratiform heating shape is more diverse among the models and many models cannot capture the characteristic profile with heating aloft and cooling at low levels. This not only highlights the importance of the cloud and microphysics representation but also the coupling of the convection to the cloud scheme.
 In all the three periods (Figure 7), we note that GAMIL has the smallest convective mass flux with a bottom-heavy profile (approaching zero above 400 hPa). HIRAM also has relatively small convective mass flux and a bottom-heavy convective heating profile (approaching zero above 200 hPa). Such small convective heating in the upper troposphere is closely related to the upper level cold bias in these two models (Figure 6). It is worth noting that the upper level cold bias in climate simulations is generally not as large as that in these short-term forecasts [Zhao et al., 2009].
3.3. Resolution Impact
 With increased computer power, AGCMs are running at higher resolutions. Some global operational forecast models are regularly run at a resolution of ∼20 km (Table 2). For climate models, there are also active studies to understand the resolution benefit [Boyle and Klein, 2010, and references therein]. These studies suggest that the topography is better resolved at higher resolution and local-scale circulations are better described. However, the overall improvement is moderate and some inherent biases are not ameliorated by resolution only. One relatively robust result from these studies is the increase of large-scale (L-S) precipitation at higher resolution [Lau and Ploshay, 2009; Boyle and Klein, 2010]. This is also the case here with the fine resolution forecasts generally having more L-S precipitation than the coarse resolution forecasts (not shown).
 Considering that identical physics are used in different resolution simulations, we expect to see similar cloud properties between high and low resolution simulations. This is generally the case, especially for the IWC and LWC vertical structures (Figure 5). For example, LWC peaks near 600 hPa for the two IAP simulations. Two JMA models produce very close LWC, IWC, and CF profiles. In terms of temperature and moisture bias, high resolution does not necessarily reduce those biases (Figure 6), which is mostly characterized by model's physical parameterizations. We see the vertical structures of various heating terms and convective mass fluxes are also similar between the high and low resolution simulations (Figure 7). Stratiform heating generally increases with resolution, especially during the wet period (Figure 7). Fine resolution is more favorable for shallow convection, at least for IFS in terms of the mass flux (Figure 7). At higher resolution, more lower boundary inhomogeneity, such as terrain and soil moisture, are introduced. This increases the inhomogeneity of near surface thermodynamic fields and thus the probability of convection occurrence and intensity variations.
 The diurnal cycle of precipitation has long been a challenge to numerical models [Betts and Jakob, 2002; Dai and Trenberth, 2004; Yang and Slingo, 2001; Bechtold et al., 2004; Dirmeyer et al., 2012]. To analyze the resolution impact on the precipitation diurnal cycle, we focus on a box (the dashed box in Figure 1) over the coastal and inland region near Darwin, where satellite observations show a relatively robust diurnal cycle during the break period (Figure 2b). TRMM retrievals reveal a late afternoon/early evening precipitation maximum (Figure 8). Most models possess a robust precipitation diurnal cycle dominated by convective precipitation peaking from near noon (low resolution HIRAM, MetUM, CAM4, and GME) to late afternoon (JMA and IFS). Such robust diurnal cycle of convection over land is closely related to the strong diurnal cycle of surface sensible (Figure 8d) and latent heat flux (not shown) though its phase may lag behind the heat flux phase. The diurnal cycle in AM2 is modulated by the relatively long relaxation time (∼12 h) used in RAS. Total precipitation is generally dominated by convective precipitation except for the high-resolution JMA and HIRAM, which have substantial (4 and 10 mm day−1) L-S precipitation near midnight (Figure 8c). As a result, the total precipitation diurnal cycle is significantly modified. Convective precipitation is able to extend to nighttime in most models. Overall, the impact of resolution on the precipitation diurnal cycle is relatively small unless the resolution significantly modulates the precipitation partitioning (e.g., HIRAM and JMA). 125-km IFS has both L-S and convective precipitation peaking in the evening, while 25-km IFS precipitation is mainly convective with a peak in the afternoon. The resolution dependence in the phase of the diurnal cycle can be attributed to the horizontally variable and resolution dependent CAPE adjustment time scale [Bechtold et al., 2008], which in the IFS is of O(20–60 min) for the 25-km resolution and of O(1–3 h) for the 125-km resolution. A longer adjustment time scale favors a smoother evolution and a shift of the precipitation peak from noon to the afternoon and early evening. Overall, the model precipitation diurnal cycle over the tropical land area is dominated by the convective parameterization when L-S precipitation is negligible. As resolution increases, L-S precipitation might increase and modify the total precipitation diurnal cycle. Due to the short time period and the small area considered in this study, these results need to be further evaluated and confirmed in the future.
Figure 8. (a) Diurnal cycle of total precipitation rate averaged over the dashed box in Figure 1 during the break period (Feb 6–13). Black line denotes the precipitation rate from TRMM 3B42 retrievals. (b) Same as Figure 8a but showing the convective precipitation rate. (c) Same as Figure 8a but showing the large-scale precipitation rate. (d) Same as Figure 8a but showing the sensible heat flux. Dashed lines are fine resolution model results.
Download figure to PowerPoint