This study quantitatively evaluates the overall performance of nine single-column models (SCMs) and four cloud-resolving models (CRMs) in simulating a strong midlatitude frontal cloud system taken from the spring 2000 Cloud Intensive Observational Period at the Atmospheric Radiation Measurement (ARM) Southern Great Plains site. The evaluation data are an analysis product of constrained variational analysis of the ARM observations and the cloud data collected from the ARM ground active remote sensors (i.e., cloud radar, lidar, and laser ceilometers) and satellite retrievals. Both the selected SCMs and CRMs can typically capture the bulk characteristics of the frontal system and the frontal precipitation. However, there are significant differences in detailed structures of the frontal clouds. Both CRMs and SCMs overestimate high thin cirrus clouds before the main frontal passage. During the passage of a front with strong upward motion, CRMs underestimate middle and low clouds while SCMs overestimate clouds at the levels above 765 hPa. All CRMs and some SCMs also underestimated the middle clouds after the frontal passage. There are also large differences in the model simulations of cloud condensates owing to differences in parameterizations; however, the differences among intercompared models are smaller in the CRMs than the SCMs. In general, the CRM-simulated cloud water and ice are comparable with observations, while most SCMs underestimated cloud water. SCMs show huge biases varying from large overestimates to equally large underestimates of cloud ice. Many of these model biases could be traced to the lack of subgrid-scale dynamical structure in the applied forcing fields and the lack of organized mesoscale hydrometeor advections. Other potential reasons for these model errors are also discussed in the paper.
 Clouds play a fundamental role in climate change through their radiative feedbacks. The treatment of clouds and their interactions with radiation has long been recognized as one of the largest uncertainties in current climate models [e.g., Cess et al., 1990]. This is mainly because of the complexity of representing cloud dynamics, microphysics, and cloud radiative properties in these models. The lack of adequate observational data, such as the observations of the three-dimensional cloud structure and cloud microphysical properties, also poses a severe restriction on the development of physically based cloud schemes.
 To advance our scientific understanding of clouds and their interactions with radiation and to improve the representation of clouds and radiation in climate models, the U.S. Department of Energy's Atmospheric Radiation Measurement (ARM) program has devoted significant efforts to obtain accurate measurements of cloud information for different climate regimes using active remote sensors, including laser ceilometers, micropulse lidars, and millimeter-wave cloud radars (MMCR) in the past few years [Stokes and Schwartz, 1994; Ackerman and Stokes, 2003], especially during its Spring 2000 Cloud Intensive Operational Period (IOP) at the Southern Great Plains (SGP) site. Several corresponding objective analysis approaches were also recently developed to process and integrate data collected from these active remote sensors in order to provide the most accurate estimates of clouds and their microphysical properties. One valuable product from such efforts is the Active Remote Sensing of Clouds Layers (ARSCL), which gives a best estimate of cloud location and radar echo characteristics above the central facilities of each of the ARM research sites [Clothiaux et al., 2000]. On the basis of the ARSCL data, a best estimate of cloud liquid and ice water retrieved from cloud radar reflectivity is also currently available [Miller et al., 2003]. These data provide essential information for evaluation and development of model cloud and radiation parameterizations.
 A natural first step to improve model parameterizations is to systematically evaluate model performance in simulating clouds of various synoptic processes against available observations. A multimodel intercomparison has proven useful to identify strengths and weaknesses of model parameterizations by comparing results among different models and with observations [e.g., Ghan et al., 2000; Xie et al., 2002; Xu et al., 2002]. The SCM and CRM are two useful modeling tools to test and develop parameterizations and to isolate deficiencies by specifying the large-scale forcing terms (e.g., the large-scale advective tendencies of temperature and moisture and vertical motion) from observations [Randall et al., 1996]. The large-scale forcing fields are derived from field measurements (e.g., ARM) through an objective analysis, such as the constrained variational analysis approach developed by Zhang and Lin  and used in ARM [Zhang et al., 2001].
 With comprehensive cloud data available, the ARM Cloud Parameterization and Modeling (CPM) Working Group (WG) and the Global Energy and Water-Cycle Experiment (GEWEX) Cloud System Study (GCSS) WG 3 have recently made a concerted effort to assess the model ability to reproduce a variety of cloud types observed during the ARM Spring 2000 Cloud IOP, i.e., the case 4 model intercomparison study (see http://science.arm.gov/wg/cpm/scm/scmic4/). A hierarchy of models: SCMs, CRMs, mesoscale limited-area models (LAMs), and general circulation models (GCMs) is used in the case 4 study. The science theme of case 4 is aimed at increasing our understanding of the processes that determine the cloud amount in observations and in models.
 This paper reports results on the ability of SCMs and CRMs to simulate a strong deep midlatitude frontal cloud system in the case 4 study. The evaluation of model simulations of shallow frontal clouds is described in a companion paper [Xu et al., 2005]. Clouds associated with fronts are one of the most commonly observed cloud systems in midlatitudes. A wide variety of cloud types are often observed associated with fronts, including cumulonimbus clouds usually associated with cold fronts, and cirrus, stratus, and nimbostratus clouds associated with warm fronts. Mesoscale circulations driven by subgrid-scale dynamical and thermodynamical processes and cloud microphysics also have large influence on the frontal circulations and cloud fields. This complexity makes the parameterization of frontal clouds in large-scale models a challenging task.
 Evaluation of the ability of numerical models to simulate frontal clouds can be found in previous studies [e.g., Katzfey and Ryan, 1997; Klein and Jakob, 1999; Katzfey and Ryan, 2000; Ryan et al., 2000]. Klein and Jakob  examined the frontal clouds simulated by the European Center for Medium-Range Weather Forecast (ECMWF) model by comparing a composite of about 200 cyclones over the Northwest Atlantic with that based on satellite measurements from International Satellite Cloud Climatology Project (ISCCP) cloud products [Rossow and Schiffer, 1991, 1999]. They found that the ECMWF model could reasonably capture the position of clouds relative to a low-pressure center. However, the model largely overestimated optically thin cirrus clouds and optically thick low clouds and grossly underestimated the middle-level clouds in the cold sector of the storm in comparison with the ISCCP data. Similar model deficiencies in simulating cirrus clouds and middle-level clouds were also found in the work of Ryan et al. , which investigated the representation of frontal cloud systems in a hierarchy of models (SCMs, CRMs, LAMs, and GCMs) in simulating a typical cold front case observed over Australia. In their study, SCMs and CRMs were driven with the boundary forcing data produced from a 20-km LAM. They found that increasing model resolution reduced the bias but did not eliminate it. The reasons for these model errors are still not completely understood. Owing to the lack of detailed measurements of clouds and related microphysical fields, Ryan et al.  could not compare quantitatively the model simulations directly with observations.
 The objective of this study is to use the recently available ARM data along with other observations (e.g., ISCCP clouds) to quantitatively evaluate the overall performance of model cloud parameterizations in simulating midlatitude frontal systems. In this study, we try to understand the processes that modulate frontal clouds in observations and in models and their associated cloud microphysical properties. A strong frontal case during the period 1–3 March 2000 is selected for this purpose. Results from 9 SCMs and 4 CRMs are analyzed in this study. We will focus our discussion on features that identify the common strengths and weaknesses of the models. Impact of uniformly applying the ARM observed large-scale forcing to SCMs and CRMs on model simulations will be also discussed. Our purpose is to analyze and understand how clouds are generated in observations and parameterized in models so as to gain insights for further improving the parameterization of clouds in climate models.
2. Model Description and Data
2.1. Model Description
Table 1 lists the 9 SCMs and 4 CRMs participating in the intercomparison study and the references that describe these models. It is seen that the vertical resolutions used in the SCMs vary from 17 levels (CSU) to 53 levels (SCRIPPS) while they gradually increase from 100 m near the surface to about 500–1500 m above 5 km in the CRMs. A 2 km horizontal resolution is used in most CRMs except for ISU, in which the resolution is 3 km. All the CRMs used in this study are two-dimensional (2-D), oriented on the east-west direction. Earlier studies [e.g., Grabowski et al., 1998; Khairoutdinov and Randall, 2003] showed that 2-D and 3-D CRMs could generally produce similar statistical characteristics of many important fields (e.g., the mean temperature and moisture profiles), especially for those two-dimensionally organized convective systems such as in squall lines, while they produced noticeable differences in the evolution of these fields. As we will show later in this study, the 2-D CRMs will have difficulties in correctly capturing strong frontal systems that are generally a result of the development of a baroclinic wave, due to the lack of north-south temperature gradient in the 2-D framework.
Table 1. Summary of SCMs Used in the Intercomparison Study
 Parameterizations of cloud condensates, cloud fraction, and cumulus convection in these SCMs and CRMs are listed in Table 2. The use of physically based prognostic schemes to predict cloud water and ice has been adopted in all SCMs except SCCM3_SIO, which uses a diagnostic cloud scheme. Prognostic cloud condensate schemes allow a direct coupling among the model cloud processes, hydrological processes, dynamical processes, and radiative processes. An example of this is that prognostic cloud schemes explicitly represent stratiform clouds and cirrus clouds in connection with cumulus convection through the detrainment of convective condensates from cumulus updrafts. This direct coupling however is lacking in diagnostic approaches [e.g., Slingo, 1987], in which convective detrainment affects the cloud condensates by changing the large-scale moisture field through re-evaporation of the detrained hydrometeors. In general, the prognostic cloud condensate schemes used here can be categorized into three groups: (1) One prognostic equation is used for both cloud water and cloud ice. The distinction between cloud liquid and cloud ice is determined by grid-mean temperature. Rain and snow are diagnosed. This type of scheme follows the pioneering work of Sundqvist  with variations and improvements, such as the schemes used in GISS, McRAS, SCAM2, and SCRIPPS. (2) Separate prognostic equations are used to predict cloud water and ice while rain and snow are diagnosed (e.g., ECHAM5, GFDL, and PNNL). (3) Bulk cloud microphysics equations are used to predict cloud water, ice, rain, and snow (CSU). These bulk microphysics schemes were developed originally for mesoscale models [Lin et al., 1983; Rutledge and Hobbs, 1984].
Table 2. Summary of Cloud and Convection Schemes Used in the Intercomparison Study
 Three types of cloud fraction schemes are used in these models: (1) The diagnostic cloud fraction schemes similar to Sundqvist  are used in GISS and SCAM2 and the diagnostic cloud fraction scheme developed by Slingo  is used in SCCM3_SIO. These diagnostic schemes are mainly based on grid mean relative humidity (RH). However, the threshold RH for cloud formation is usually assumed differently in different models. For example, the threshold RH 0.8 is used in GISS and 0.9 above 750 hPa and 0.75 below 750 hPa are used in SCAM2. The grid mean vertical velocity is also an important factor in controlling cloud faction in the Slingo  scheme; (2) The statistical schemes described by Tompkins  and Menon et al.  are used in ECHAM5 and PNNL, respectively. In these schemes, a probability density function (PDF) needs to be assumed to account for the subgrid-scale distribution of total water in the model grid box. The distribution of total water implicitly depends on RH. The width and skewness of this distribution are often controlled by a variety of physical attributes, such as turbulence, convection, precipitation, and the vertical gradient of total water; (3) The prognostic cloud scheme developed by Tiedtke  is used in GFDL, McRAS, and SCRIPPS. In this scheme, clouds from convective detrainment and boundary layer processes can occur at any RH while the formation of stratiform clouds can only occur when the RH is larger than a threshold value. Note that there is no cloud fraction parameterization used in the CSU model. It simply sets the cloud fraction either 0 or 1 dependent on if the total amount of cloud condensate exceeds a critical value.
 Most CRMs use bulk cloud microphysics approach similar to that described in the work of Lin et al.  and Rutledge and Hobbs . The ISU CRM uses a bulk warm rain parameterization developed by Kessler  and a bulk ice parameterization scheme described in the work of Koenig and Murray . In these CRMs, a cloudy grid at a given height is diagnosed if the sum of cloud water and cloud ice mixing ratio exceeds 1% saturated water vapor mixing ratio with respect to liquid [Xu and Krueger, 1991]. Cloud fraction is the sum of cloud grids divided by the total grids within the CRM domain. Xu and Krueger  tested the sensitivity of the cloud fraction diagnosed in CRMs to this threshold value and they showed that their CRM produced very similar cloud amounts by using the threshold values of 1% and 3% above the saturated water vapor mixing ratio.
 Parameterization of cumulus clouds is required in SCMs but not used in CRMs. SCAM2 and SCRIPPS use a convection scheme developed by Zhang and McFarlane , which is based on the same spectral rising plume concept as used in the Arakawa and Schubert  scheme (hereinafter referred to as AS). SCCM3_SIO also uses the Zhang-McFarlane scheme, but with a modified closure based on the work of Zhang . Other variations of the AS scheme are used in CSU, which uses a prognostic closure based on the cumulus kinetic energy [Pan and Randall, 1998], and in GFDL and McRAS, which use a relaxed AS scheme developed by Moorthi and Suarez  with several modifications to convective triggers (CTR) and inhibitors (CIN) for the existence of convection (see relevant references listed in Table 1 for these models). Various bulk mass flux schemes, which use one single cloud model to describe an average over all cloud types within a convective ensemble, are used in ECHAM5, GISS, and PNNL. In addition, all convective schemes are penetrative except for the HACK scheme [Hack, 1994] used in PNNL, which is based on a three-level nonentraining cloud model.
 There are also many differences among these models in parameterizing radiation and turbulent processes. The details can be found in the references listed in Table 1.
2.2.1. Large-Scale Forcing Data for SCMs and CRMs
 The large-scale forcing data required to run the SCMs and CRMs were derived from the 3-hourly sounding data at the ARM Central Facility (CF), which is located at 36.6N, 96.5W, and its four boundary stations merged with seven NOAA wind profiler data during the ARM 2000 Spring Cloud IOP, using a constrained objective variational analysis method [Zhang and Lin, 1997; Zhang et al., 2001]. Domain-averaged surface precipitation, latent and sensible heat fluxes, and radiative fluxes at the surface and the top-of-the-atmosphere (TOA) are used as constraints, to force the atmospheric state variables to satisfy the column conservation of mass, heat, and moisture. These constraints were observed from a dense surface measurement network along with satellite measurements from the GOES during the IOP over the ARM SGP SCM domain that is roughly represented by the circle in Figures 1a–1d. Zhang et al.  gave detailed information about the upper air data and the surface and TOA measurements that are used in the variational analysis.
2.2.2. Evaluation Data
 The evaluation data used in this study are mainly from the ARM in situ data collected during the ARM Spring 2000 IOP. These include (1) the SGP domain-averaged 3 hourly upper air data and surface and TOA measurements, (2) the ARSCL cloud frequency at CF [Clothiaux et al., 2000], and (3) the cloud liquid and ice water contents from MICRO-BASE products [Miller et al., 2003]. Note that these single point cloud measurements may not represent the SGP domain-averaged values well. The ISCCP D1 3-hourly cloud products [Rossow and Schiffer, 1991, 1999], which classify cloud types based on their top pressure and optical thickness, and GOES satellite images are therefore also used to describe the horizontal distribution of the frontal clouds over the SGP domain in this study. The domain of the ISCCP equal area data set is 280 km × 280 km and the single grid box nearest to the ARM site is 35N–37.5N, 99.3W–96.2W.
3. Spring Frontal Case
 The time period chosen for this study is from 1800 UT, 1 March to 2400 UT, 3 March 2000. During this period, a strong cold front first developed from a low-pressure system over southwest Colorado on 0000 UT, 2 March. It moved southeastward into southeastern New Mexico and northwestern Texas overnight and met with a stationary front formed earlier over central Texas. Both fronts then moved northeastward. The cyclone and the associated strong frontal system approached the ARM SGP site around 0900 UT, 2 March and resulted in heavy rainfall. After 1800 UT, 2 March, the strong frontal system passed the site and the surface rainfall significantly reduced. Figures 1a–1d are four snapshots showing the GOES satellite images along with the surface low-pressure systems and the surface fronts during the periods that the fronts formed, approached, and passed over the SGP site. The circle in these figures denotes the SGP domain and the dot in the center of the circle is the location of the ARM Central Facility. The SGP domain-averaged surface precipitation rates are shown by the dotted line in Figure 2a.
 The GOES satellite data (Figure 1a) indicated that there were several cloud streaks ahead of the major frontal cloud band that would affect the SGP site later. These prefrontal cloud streaks passed across the SGP site from 0000 UT, 2 March to 0600 UT, 2 March. The major cloud band was ahead of the cold front, located to the northeast of the surface cyclone. It crossed over the SGP site from 0900 UT, 2 March to 2100 UT, 2 March with strong surface rainfall (Figures 1a–1b and 2a). After the frontal passage, clouds decreased in the SGP domain until the low and middle level postfrontal clouds moved into the domain around 0600 UT, 3 March (Figures 1c–1d). These low and middle clouds corresponded to the head of the comma-shaped cloud system, and were with the wrapping-around cyclonic air to the north and west of the cyclone. After 1500 UT, 3 March, a high-pressure system dominated the Great Plains and the ARM SGP site was generally dry and clear.
 To facilitate the investigation, we divide this frontal passage over the SGP into three periods and define them as prefrontal, frontal, and postfrontal clouds, corresponding to time periods from 1800 UT, 1 March to 0900 UT, 2 March, from 0900 UT, 2 March to 2100 UT, 2 March, and from 2100 UT, 2 March to 2400 UT, 3 March. We will use A, B, and C to denote these three periods, which are marked in Figure 2a, in the following discussion. The information of cloud types in the frontal system can also be obtained from the ISCCP data. The cloud-top pressure and optical thickness histograms averaged over the SGP domain from the ISCCP satellite three hourly data for the three periods are shown in Figures 3a–3c. Since ISCCP clouds are not available during nighttime, only daytime data during these periods are used in these figures. The classification of cloud types according to cloud-top pressure and cloud optical thickness is defined in the work of Rossow and Schiffer [1991, 1999] for ISCCP clouds and is given in Table 3. It is seen that optically thin clouds with tops at the high and middle levels (mainly cirrus) are the major cloud types in the prefrontal system associated with the cloud streaks (Figure 3a) while the optically thick deep convection clouds dominated the major cloud band (Figure 3b). Accompanying the high deep cumulonimbus clouds, the optically thick nimbostratus and stratus clouds are also seen in the middle and lower levels. In contrast, Figure 3c indicates that the optically medium and thick clouds with tops at the low and middle levels are the main cloud types in the postfrontal system.
Table 3. Classification of Cloud Types Used in This Studya
 While the GOES satellite images and the ISCCP data give a useful description of horizontal cloud distributions and cloud types associated with the frontal system, they do not show vertical structures of the frontal clouds. To complement the satellite measurements, the single station measurement of cloud frequency from the ARSCL products at the ARM Central Facility (CF) during this period is shown in Figure 2a. These data are originally at 10-s and 45-m time and height ARSCL intervals. They are averaged to 3-hour and 25 hPa intervals to better represent clouds in the SGP domain. These resolutions are also consistent with the ARM variational analysis data.
 The ARSCL clouds show large temporal variability as the prefrontal, frontal, and the postfrontal systems crossed the site. Consistent with the ISCCP clouds, the prefrontal system was primarily associated with high clouds. The two local maxima of high clouds at about 0000 and 0300 UT, 2 March correspond to the two cloud streaks in Figure 1a. Between the second cloud streak and the main cloud band, a cloud minimum is shown around 0600 UT, 2 March in both Figures 1a and 2a. When the major frontal system crossed the SGP site, the ARM active remote sensors at CF observed a large amount of clouds in all altitudes with the maximum at 1500 UT, 2 March. During this period, heavy rainfall was also observed. Behind the front, the observations showed a transition period between 2100 UT, 2 March and 0300 UT, 3 March where clouds were considerably reduced. An extensive deck of low-level postfrontal clouds is seen after 0300 UT, 3 March. Within the period, a second peak of middle and high postfrontal clouds were observed around 0600–0900 UT, 3 March. This second peak was associated with cyclonic cloudy air wrapping from the back of the cyclone (Figures 1c and 1d). It will be shown later that this peak of middle clouds contained very small amount of cloud liquid and ice and was associated with domain-averaged downward motion at around 500 hPa. It is thus likely associated with advection of hydrometeors. In addition, other processes, such as upward motions occurring at subgrid scales that cannot be represented in the observed large-scale vertical motion field, could also contribute to the second perk of middle and high postfrontal clouds. It should be noted that there are some differences in the cloud information given by the ARSCL data and the ISCCP data. The ARSCL clouds provide information for both cloud top and cloud base for each cloud layer, while the ISCCP satellite measurements provide only cloud top at the uppermost layer. However, the ARSCL clouds are single station measurements while the ISCCP clouds shown in Figures 3a–3c are averaged over a domain with size close to the SGP domain. The isolated peak of high and middle clouds slightly before 0000 UT, 3 March in Figure 2a did not correspond to a large-scale or mesoscale system in Figure 1c, and is therefore only a small-scale feature.
Figures 2b–2c show the vertical distribution of the cloud water content (LWC) and cloud ice content (IWC) retrieved from the MMCR cloud radar reflectivity (Z) at CF for this case [Miller et al., 2003]. These data are averaged over the 3-hour and 25 hPa intervals from the 10-s and 45-m ARSCL intervals and then are multiplied by the ARSCL cloud frequency to account for an average over all sky. The cloud water and ice phases are determined on the basis of domain-averaged temperature from the variational analysis. All ice is assumed when T ≤ −16°C while all liquid is assumed when T ≥ 0°C. Cloud water and ice can both coexist when −16°C < T < 0°C.
 The algorithm used to retrieve LWC is based on the Z-LWC relation proposed by Liao and Sassen , in which the radar reflectivity profile is used in conjunction with an adiabatic representation of cloud liquid water to derive a reflectivity-to-liquid water content transformation. This reflectivity-based LWC profile has been scaled by the liquid water path (LWP) measurement from a microwave radiometer. IWC is retrieved from a direct radar reflectivity transformation [Liu and Illingworth, 2000]. For the mixed-phase clouds, the observed radar reflectivity is fractionated linearly such that IWC steadily increases until all ice is assumed at −16 degrees C.
 The accuracy of the retrieved LWC and IWC is always a concern when these values are used to identify model problems. Quantitative estimation of the uncertainty in these fields is still an ongoing research. A rigorous evaluation of the accuracy of the retrieved LWC and IWC profiles is underway at present by the ARM Cloud Properties Working Group. In general, estimates suggest that the uncertainty in LWC is highly modulated by assumptions made in the original adiabatic scaling, as would be expected. If all of the LWP uncertainty were to be observed in a single radar range gate, which is assumed to be 100 m for the purpose of this calculation, the uncertainty in the liquid water content would be ∼30% for an adiabatic cloud with a cloud base temperature of 8°C. The accuracy of the derived IWC is mainly related to the variability in the density of ice particle and the ice particle size, as discussed in the work of Liu and Illingworth . Individual values of IWC derived from a single measurement of radar reflectivity in current algorithm are reported to have errors of +100% and −50%. The errors in the LWC and IWC fields can be larger when rain or snow is present since the retrieval algorithm used to produce the data cannot discriminate between cloud liquid and rain and between ice and snow. Therefore the radar retrieved cloud liquid and ice are less reliable at times of heavy precipitation. Moreover, the temperature used to distinguish cloud water and ice is a tunable parameter in both the observations and the models. Different threshold temperatures could be used in those models that distinguish ice from liquid based on temperature. For example, some models (e.g., GISS, McRAS, and SCAM2) assume all liquid when T ≥ −10°C and most models assume all ice at a colder threshold temperature (e.g., T ≤ −40°C). Other models use separate and physically based equations to predict cloud water and ice. No threshold temperature is used to separate cloud water and ice in these models (e.g., CSU, ECHAM5, GFDL, and PNNL).
 Similar to the cloud field, the cloud water and ice contents (Figures 2b–2c) also showed large temporal variability related to the prefrontal, frontal, and postfrontal systems. The cloud radar observed only a very small amount of LWC and IWC during the prefrontal period (A). In contrast, a considerable amount of LWC was observed during and after the frontal passage (periods B and C). There are two LWC maxima located in the lower troposphere associated with the frontal clouds and the postfrontal clouds. Similar to the cloud field, a transition period is also seen between the two LWC maxima, where the LWC was reduced to almost zero between 2100 UT, 2 March and 0300 UT, 3 March, corresponding to dry intrusive cyclonic air passing through the SGP. There are three IWC maxima located in the middle and upper troposphere associated with the three cloud maxima at these levels over the same periods. As mentioned before, the second maximum appeared to be a local feature in the satellite image and it was not accompanied by a maximum of LWC in the lower troposphere. Comparing the first and the third IWC maximum, the latter was not only considerably weaker, but also at a much lower altitude.
 In both the SCMs and CRMs, these cloud systems need to be calculated from large-scale environmental conditions similar to those simulated in GCMs (e.g., grid-scale vertical motion and relative humidity etc.). Figures 4a–4f show the domain-averaged vertical velocity (omega), RH (calculated from domain-averaged temperature and moisture), and total and horizontal advective tendencies of dry static energy (S) and moisture (q), respectively. The dry static energy values are normalized by Cpd (1004 J kg−1K−1). These fields are derived from the ARM observations using the variational analysis approach [Zhang and Lin, 1997; Zhang et al., 2001]. During the prefrontal passage (period A), the omega field (Figure 4a) indicated a weak upward motion in the middle and upper troposphere and a slightly stronger downward motion in the middle and lower troposphere between 1800 UT, 1 March and 0600 UT, 2 March. The cloud streaks in Figure 1a during this period showed that the upward motion during this period is spatially distributed in narrow regions before the frontal passage rather than uniformly across the averaging domain. Associated with the domain-averaged vertical motion, a weak cold advection is seen in the levels between 665 hPa and 215 hPa and a weak warm advection is seen below 665 hPa in the large-scale dry static energy field (Figure 4c). The large-scale moisture field (Figure 4d) showed a moisture convergence above 715 hPa and a moisture divergence below. The horizontal advection of dry static energy was negative in the first few hours and became positive after 0000 UT, 2 March in almost all altitudes (Figure 4e) while the moisture field had convergence above 815 hPa and divergence below (Figure 4f). The RH field (Figure 4b) showed that the prefrontal air was quite moist in the ascent region and it was rather dry in the subsidence region. During the frontal passage (period B), the large-scale forcing fields showed a strong upward motion in all altitudes. There was also large warm horizontal advection in the lower troposphere levels below 665 hPa, associated with the main southerly air stream before the cold front. As a result, the frontal cloud system was associated with a strong total advective cooling in the middle and upper troposphere between 765 hPa and 365 hPa and a relatively strong total advective warming in the lower troposphere. The total moisture advection near the surface is slightly negative, resulting mainly from negative horizontal advection. The negative horizontal advection is related to easterly near-surface wind that advected dry air from a preceding high-pressure system into the SGP domain. The vertical moisture advection (not shown) is positive below 915 hPa and negative between 915 and 815 hPa related to the local vertical maximum of moisture near 800 hPa associated with the moisture transport by the prefrontal southerly air stream. These features of warming and drying in the lower troposphere have large impact on model simulations of clouds. Given the large variability of meteorological fields across the domain during this period, the sounding network is too sparse to give us high confidence in their magnitudes in the objective analysis. However, these features are also seen in mesoscale forecasting models. The relative humidity distribution shown in Figure 4b below 800 hPa at around 1200 to 1800 UT, 2 March also corroborated with this analysis. In the postfrontal period, the early part (before 1200 UT, 3 March) is associated with weak ascending motion in almost all altitudes except for the levels between 565 and 465 hPa around 0600–1200 UT, 3 March where large-scale downward motion is seen. The later part is associated with large-scale downward motion, warm dry static energy advection, and moisture divergence in the whole troposphere. Lower clouds persisted throughout this period. The relative humidity peak around 500 hPa at around 0600–0900 UT, 3 March is associated with downward motion as well as advective warming and drying. It corresponds to the cloud peak in Figure 2a with small IWC and is presumed to be the result of advection of hydrometeors.
 All participating SCMs and CRMs are driven by the large-scale forcing derived from the ARM March 2000 IOP observations, in which the horizontal and vertical advection tendencies of moisture and dry static energy (temperature plus the adiabatic expansion term), and surface fluxes are specified from the observations. In CRMs, the forcing is uniformly applied to every CRM grid point. Vertical motions at individual CRM grids are calculated by the model cloud-scale dynamics under the constraint that enforces the calculated domain-averaged vertical motions to equal to the observed values. Since the observed forcing data are available only at every 3 hours, the forcing data are interpolated in time between the 3-hour time interval when they are used in driving the SCMs and CRMs. The models are initiated at 1800 UT, 1 March and run through the frontal case. Zhang et al. gives a detailed description of the experiment design (see http://science.arm.gov/wg/cpm/scm/scmic4/).
 As was noted earlier, heavy surface rainfall was observed as the frontal cloud system passed across the SGP site. The hourly Arkansan Basin Red River Forecast Center (ABRFC) 4-km rain gauge adjusted WSR-88D radar measurements indicated that the rain was mainly from stratiform clouds associated with the large-scale frontal system as shown by the solid black lines in Figures 5c–5d (convective) and 5e–5f (stratiform). The separation of the total radar rainfall into convective component and stratiform component is based on the algorithm described in the work of Johnson and Hamilton . A 6 mm hr−1 threshold value is used for the convective stratiform partitioning in this study. Given the observed large-scale forcing, both the SCMs and CRMs are able to capture the synoptic-scale dominated precipitation event as shown in Figures 5a–5b, which compare the observed SGP domain-averaged total precipitation with those from the SCMs and CRMs, respectively. However, the SCMs and CRMs differ significantly when the total precipitation is partitioned into the convective component and the stratiform component (Figures 5c–5f). The partitioning method used to separate the convective component and stratiform component is followed the approach proposed by Xu . As defined in the work of Xu , the convective region includes a core and two adjacent grid columns. A core consists of at least on grid column that satisfies one of the following three conditions: (1) the maximum cloud draft strength (∣wmax∣) is twice as large as the average over the four adjacent grid columns, (2) ∣wmax∣ is greater than 3 m s−1, or (3) surface precipitation rates are larger than 25 mm hr−1. These criterions are generally consistent with observations of mesoscale convective systems [e.g., Houze, 1977, 1993]. The majority of the rain is produced by the stratiform cloud parameterizations in most SCMs except for SCAM, which generates almost all the precipitation from its convective parameterization. In addition, PNNL produces a considerable amount of convective precipitation (∼40%) and McRAS produces quite large convective precipitation over the period of 1800 UT, 2 March to 0900 UT, 3 March. In contrast, convective precipitation in the CRM simulations dominates the total precipitation (data are not available for UCLA/LaRC). It is not clear whether this is due to the different algorithms used in the CRMs and the observations to distinguish the convective and stratiform components or due to the application of uniform forcing of advective tendencies in the CRMs. In addition, the lack of the meridional temperature gradient in these 2-D CRMs may be also partially responsible for the failure of the CRMs to generate the mesoscale circulations in the frontal cloud systems that usually correspond to large meridional temperature gradient. Results from using a 3-D CRM to simulate this frontal case should provide additional insights for us to better understand this issue. This warrants a further study.
4.1. Model-Simulated 2-D Cloud Fields
 The 2-D time pressure cross sections of the SGP domain-averaged cloud fields are first examined to evaluate the overall performance of these CRMs and SCMs in capturing the frontal cloud system. Figure 6 shows the cloud fraction generated from the four CRMs. As described in section 2.1, the cloud fraction in CRMs is determined by the portion of cloudy grids occupied within a CRM domain. A cloudy grid is diagnosed if it contains significant cloud liquid and cloud ice contents, that is, the sum of cloud water and cloud ice mixing ratio exceeds 1% saturated water vapor mixing ratio with respective to liquid. Compared to the ARSCL clouds (Figure 2a) and satellite measurements (Figures 1 and 3a–3c), all models typically capture the bulk structure of the high and middle clouds in the prefrontal systems and the low clouds in the postfrontal system. However, the prefrontal high clouds are overestimated and the frontal middle and low clouds are substantially underestimated. Behind the front, the CRMs greatly underestimate clouds in the middle levels. Another feature in Figure 6 is about the timing. All the models show a few hours (3–6 hours) delay for generating the weaker prefrontal high clouds and most models (except ISU) show a 3-hour delay for generating the postfrontal low clouds. In addition, the model-simulated clouds tend to have a longer lifetime and weaker temporally variability than the observed. The broken nature in the prefrontal clouds (i.e., a cloud minimum between the two prefrontal cloud bands) and the peaks of middle and high postfrontal clouds are not captured by the models. The failure to capture the observed temporal variability should be partially due to the enforced large-scale forcing that are uniformly applied to the CRM grids while in reality the temporal variability in clouds is mainly related to subgrid-scale dynamics as shown in Figure 1.
 The horizontal advection of hydrometeors is important for SCMs and CRMs to correctly capture the timing of cloud formation, especially for the highly advective frontal cloud system. As discussed in the work of Xu et al. , the delayed start of high clouds might be caused by the lack of horizontal advection of hydrometeors in the specified large-scale forcing so that a longer time is needed to generate clouds in the models. In addition, all the models were initialized with zero clouds while the observations (Figure 1a) showed clouds existing at the beginning of studied period. This should partly account for the delayed start of high cloud in the models. The tendency of the CRMs to overestimate the amount of high clouds in the prefrontal period may be due to the lack of horizontal inhomogeneities in the thermodynamic forcing. The models' atmospheres remained cloud-free until the uniformly applied cooling and moistening tendencies forced the models to form clouds, when in reality cloud streaks were formed over small areas with concentrated upward motion. The underestimation of middle and low clouds during the frontal passage is partially related to the applied warm and dry advective tendencies in the lower atmosphere in the model. As discussed before, even though there are potential uncertainties in the forcing data, this appears to us as a real observed feature. Another possible cause of the model bias could be due to the lack of mesoscale circulation in the CRM simulations. As shown in Figure 5, the CRMs are initiating convection to replace the stratiform processes organized at the mesoscale levels when forced with strong uniform forcing. Consistent with this hypothesis, it is seen that ISU has more middle-level clouds than other CRMs, which is consistent with the less convective precipitation and more stratiform precipitation produced by ISU than other CRMs. The CRMs do not simulate the mesoscale frontal circulations well also partly because they are not properly initialized and they lack temperature gradient in the y direction in the 2-D CRM simulations as discussed before. The underestimation of middle clouds in the postfrontal period (peak at 0600–0900 UT, 3 March) could be explained by the lack of organized advection of hydrometeors at the back of the cyclone and the lack of subgrid-scale dynamic structure in the uniformly applied large-scale forcing.
 While the overall performances of different CRMs are comparable to each other, there are differences (sometimes quite large) among the CRM simulations. For example, ISU produces more clouds in the middle levels than other CRMs and CSU_SAM produces the smallest amount of lower level postfrontal clouds among the CRMs. In general, the results produced from CSU_SAM, UCLA/LaRC and ARPS/LaRC are more close to each other than to those from ISU. This is likely because the ISU model uses a quite different cloud microphysical scheme compared to those used in the other three CRMs (see Table 2). It is noted that the cloud ice content in ISU contains contributions from the model-produced snow content. This together with the more stratiform precipitation produced by this model might explain why the ISU model has more middle level clouds than other CRMs.
 The frontal clouds are also generally captured by the 9 SCMs as shown in Figure 7. All models generate large cloud amounts as the front passed over the site and most of them produce the low and middle level postfrontal clouds, which are consistent with the observations. However, there are significant differences with observations, among the models, and between the SCMs and CRMs. Similar to the CRMs, SCMs greatly overestimate high clouds during the prefrontal period. This is understandable because of the uniformly applied upward motion. During the period of strong upward motion (period B), in contrast to the CRMs, most SCMs except ECHAM5 and SCAM can generate clouds in the middle and lower troposphere. Several models even overestimate middle and lower level clouds except the levels below 815 hPa. This is consistent with the overestimation of optically thick high-top clouds as seen in most GCMs [Norris and Weaver, 2001; Tselioudis and Jakob, 2002; Zhang et al., 2005]. With the exception of the ECHAM5 and the SCAM, the cloud distributions during this period in the SCMs are similar to the joint distribution of applied large-scale cooling and moistening as shown in Figures 4c and 4d. The inability of models to simulate clouds below 815 hPa is related to the warm and dry advective forcing. The radar cloud base may be inaccurate at times of heavy precipitation since the cloud information was contaminated with precipitation. The exception is the GISS and the PNNL SCMs. Algorithms of cloud and rain evaporation are possible causes of these differences. During the postfrontal period, several models simulated large amount of middle clouds. The middle clouds are formed from mechanisms that are different from those responsible for the observed cloud peaks at 0600–0900 UT, 3 March. They are formed due to the long lifetime of either hydrometeors or prognostic clouds.
 The SCMs show larger intermodel differences than the CRMs. This could be related to the different cloud parameterizations used in the SCMs. It seems that those models that use similar schemes tend to produce some common features in the cloud field. For example, clouds tend to last longer in those models that use prognostic cloud schemes (e.g., GFDL and SCRIPPS) than those that use the RH based diagnostic schemes (e.g., GISS and SCAM) except SCCM3_SIO. The cloud structure simulated from a statistical cloud scheme in ECHAM5 is quite different from others. ECHAM5 generates far fewer middle and low clouds in the prefrontal system. The CSU model appears to overestimate clouds in all altitudes during the entire period; this is presumably related to lack of fractional cloudiness.
 In CRMs, cloud fraction is primarily dependent on cloud water and ice contents. The relative importance of the LWC and IWC depends on the altitudes (and/or temperature) where clouds form. Figure 8 displays the CRM-produced cloud liquid water contents. ARM observations (Figure 2b) show two maxima, one associated with strong precipitation (period B) and the other with lower level upward motion in the postfrontal period (C). Consistent with the observations, most CRMs except ISU produce two LWC maxima. ISU CRM only generates one LWC maximum behind the front. However, the vertical extents of the observed LWC are not well captured by the models. The model-produced LWC tends to stay in a thinner layer, and the magnitude of the first LWC maximum is significantly underestimated by all the CRMs. This bias is certainly related to the underestimation of middle and low clouds in the CRMs during this period. As discussed before, overly active convection in the models could be one of the possible causes. Behind the front, the models also produce no or a very small amount of LWC above 765 hPa, where a quite large amount of LWC was observed. This could be explained by the lack of horizontal advection of hydrometeors. In addition, there is a 3-hour phase delay for the second LWC maximum in the CSU_SAM, UCLA/LaRC, and ARPS/LaRC simulations. These problems are consistent with the errors in the CRM-produced middle and low clouds. It is noteworthy that the ISU model-produced LWC field is noticeably different from those of the other three CRMs. This presumably is the influence of differences in different microphysical schemes as described in section 2.1.
 The CRM-produced cloud ice water contents are shown in Figure 9. Since the observed cloud ice contains snow, the model-produced snow is also added to the model cloud ice content for a consistent comparison with the observations. The observations (Figure 2c) showed three maxima. This temporal variability cannot be captured by the CRMs because of the lack of subgrid-scale dynamical forcing and mesoscale advection of cloud ice. Once again, this indicates that the CRMs uniformly forced with the mean large-scale forcing have difficulties to capture the large temporal variability in frontal clouds that is closely related to subgrid-scale dynamics. The CRM-generated IWC fields last much longer than the observations. This may be related to the timescale of the applied domain-averaged forcing. Overall, the ice fields simulated in the CRMs are consistent with each other. Their magnitudes are in general agreement with observations, and the differences of their patterns with observations are expected given the domain-averaged forcing fields and the lack of hydrometeor advections. It is interesting to see that the ISU model produces the maximum IWC around 665 hPa, which is much lower than the other three CRMs that show the maximum IWC around 465 hPa to 365 hPa.
 The simulated LWC and IWC, however, differ greatly among the SCMs. Figure 10 shows the LWC field. Several models significantly underestimate the LWC. These include CSU, McRAS, PNNL, SCAM, SCCM3_SIO and SCRIPPS. The GISS SCM overestimates the LWC during period B. GFDL produces excessive LWC at a narrow region around 815 hPa over the period between 1200 UT, 2 March and 2400 UT, 3 March. At this level, the ECHAM5-simulated LWC is also quite large.
 It is interesting to see that, unlike the CRMs, the errors in the SCM-produced LWC cannot be easily correlated to the errors shown in their simulated cloud fraction. For instance, most models except GISS greatly underestimate the observed LWC between 865 hPa and 565 hPa during the period from 1200–1800 UT, 2 March. However, they all (except ECHAM5 and SCAM) produce comparable cloud amounts in these levels during this period. This is because the SCM-generated stratiform clouds are mainly dependent on the grid mean relative humidity.
 The cloud ice water content produced by the SCMs is shown in Figure 11. Note that the IWC in the SCMs does not include snow content since the most SCMs did not provide this field. The exception is GFDL, which does not discriminate between ice and snow in the calculation. Figure 11 shows significant differences within the SCM simulations. GFDL and GISS produced the IWC greatly larger than other models and even larger than the observed value, which includes the snow content. CSU, McRAS, SCAM, SCCM3_SIO, and SCRIPPS underestimated IWC. It is well known that the NCAR CCM3 global model has much lower IWC, and the prognostic cloud parameterization scheme of Rasch and Kristjánsson  in CAM2 was tuned to yield agreement between CCM3 and CAM2. The same magnitude of IWC in SCAM and SCCM3_SIO is consistent with this. The IWC maximum in GFDL is around 715 hPa, which is much lower than that in GISS and the observations. In PNNL the IWC extends from upper troposphere to near the surface, which may reflect some potential problems in determining the ice phase in this model. Once again, it is seen that the model-produced IWC is less correlated to the cloud fraction field in most SCMs.
4.2. Vertical Distribution of Cloud-Related Fields Averaged Over Periods A, B, and C
 To further demonstrate the model strengths and weaknesses in capturing the frontal clouds during the different stages of the frontal passage, we now examine the mean vertical structure of cloud-related variables averaged over the prefrontal, frontal, and postfrontal periods (A, B, and C, respectively) as defined in section 3. To help interpret the results in cloud fields, the model-simulated large-scale variables, such as temperature, moisture, and relative humidity, are also examined. In addition, the simulated cloud types from the ISCCP simulator described in the work of Klein and Jakob  and Webb et al.  implemented in some SCMs (i.e., GFDL, GISS, and ECHAM5) are also discussed by comparing them with ISCCP data. The ISCCP simulator was developed to diagnose model clouds in a similar way that a satellite would view an atmosphere with physical properties (e.g., cloud height, cloud cover, and optical depth) specified by the model.
4.2.1. Period A: Weak Prefrontal Clouds
Figure 12a compares the vertical structures of model and ARSCL clouds averaged for period A. Consistent with earlier discussions, given the observed large-scale forcing fields, both SCMs and CRMs are able to reproduce the bulk structure of the observed high and middle clouds associated with the weaker cloud band. Both the observation and the models show the majority of clouds in the upper troposphere and no clouds below 665 hPa. The magnitude of the observed clouds however is overestimated by most SCMs and CRMs in the levels above 415 hPa, even though there is a 3-hour delay of cloud initiation in most models as shown in Figures 6–7. In contrast, the ISU CRM and the CSU SCM generally underestimate the observed clouds. The underestimation is mainly because the two models generate the clouds almost 6 hours late compared to the observation. It is seen that both SCMs and CRMs differ greatly in the simulated high cloud amounts. The differences can be as large as 30%, such as between CSU_SAM and ISU in the CRMs, and between GISS and CSU in the SCMs.
Figures 12b and 12c are the same as Figure 12a but for the LWC and IWC fields, respectively. As was mentioned earlier, the SCM-produced cloud ice does not include snow while the observed and the CRM-produced IWC do. Both the observation and models show a rather small amount of cloud liquid and ice water contents in the weak prefrontal system with the exception of the GISS model, which produces cloud ice water that is considerably larger than the observation and other models.
 To further understand these biases in the cloud fields, the model-simulated large-scale variables are examined since they can strongly affect the cloud formation, especially the relative humidity, which is a major control variable in most cloud parameterizations used in the SCMs. Figures 12d–12e show the temperature and moisture departures from the ARM observations averaged over the period A, respectively. For temperature (Figure 12d), most models generate a rather small cold bias above 565 hPa and a small warm bias below. The biases become slightly larger above 215 hPa and near the surface. SCAM and UCLA/LaRC show quite different results from other models in the lower troposphere, where SCAM produces a small cold bias between 815 hPa and 665 hPa and a rather large warm bias below 815 hPa while UCLA/LaRC generates a cold bias near the surface. For moisture (Figure 12e), the biases in all model simulations are very small during this period, except SCCM3_SIO, which shows relatively large moist bias below 865 hPa. Almost all models are slightly moister in the levels above 565 hPa and drier below than the observations.
 In contrast to the small errors in the temperature and moisture simulations, both SCMs and CRMs considerably overestimate the observed relative humidity in the middle and upper troposphere, especially for the levels between 415 hPa and 265 hPa (Figure 12f). This is consistent with the excessive clouds produced in these levels by most models. However, it is surprising to see that the models differ so greatly in the cloud fraction field, given the rather small intermodel differences in the simulated relative humidity. This indicates that uncertainties in the SCM cloud parameterizations, such as the threshold relative humidity, largely account for the cloud biases produced during this period.
 The model-generated cloud types and associated optical properties can be evaluated with the ISCCP data. Figures 3d, 3g, and 3j show the SGP domain-averaged cloud-top pressure and optical thickness histograms from ECHAM5, GFDL, and GISS simulations averaged over daytime data points within period A for a consistent comparison with the ISCCP clouds, respectively. As discussed earlier, the observations (Figure 3a) show that the optically thin high and middle level clouds are the main cloud types in the weaker cloud band. Compared to the satellite measurements, both ECHAM5 and GFDL capture correctly the optically thin transparent cirrus clouds (in upper left corner in the diagram) while they completely miss the high and middle level cirrus and altocumulus clouds. The cloud types generated by the GISS model also differ greatly from ISCCP data. It substantially overestimates the optically thin clouds (τ < 1.3 μm) between 440 hPa and 310 hPa. The model completely misses the optically thin transparent cirrus and the optically thin middle-top clouds between 650 hPa and 500 hPa. The problem of the lack of clouds with tops at middle levels in these models was also found in their 3-D GCM simulations [Zhang et al., 2005] and other previous studies [Klein and Jakob, 1999; Ryan et al., 2000].
4.2.2. Period B: Strong Frontal Clouds
Figure 13 gives the same information as Figure 12 except for period B, in which the major frontal cloud system passed over the ARM SGP site. Both the ISCCP data and the ARM ARSCL cloud data indicated that the SGP site experienced a large amount of frontal clouds. The ARSCL data showed two cloud maxima, one near 315 hPa and one at 715 hPa, in its vertical distribution (Figure 13a, black line). The observed vertical structure is not well captured by all the models (Figure 13a) although some models do show a hint of a midtroposphere local minimum (e.g., PNNL and SCCM3_SIO). Most models produce excessive clouds throughout most parts of the troposphere from 765 hPa to 215 hPa when compared to the observations. The ECHAM5 and SCAM SCMs and all CRMs produce more clouds in the upper layer and substantially less clouds below 565 hPa. Another important feature in Figure 13a is the apparent absence of the boundary layer clouds simulated by almost all SCMs and CRMs. The GISS and PNNL models are the only two that produce some boundary layer clouds. As was mentioned earlier, the ARSCL clouds at the surface are most likely precipitation rather than clouds since the cloud base information was contaminated with the precipitation size drops during the heavy rainfall period.
Figure 13b clearly shows that all the SCMs and CRMs significantly underestimate the observed cloud liquid water during period B. Only a few SCMs (e.g., ECHAM5, GFDL, GISS, and SCAM) produce relatively larger values (still underestimated) of the cloud liquid water. It is interesting to see that the GISS-produced LWC maximum locates higher than the LWC maximum observed and simulated in other models. The reason needs to be further investigated.
 For the cloud ice water content (Figure 13c), the observed profile was typically captured by the CRMs. However, most of them (except CSU_SAM) produced more ice in the lower troposphere between 865 hPa and 565 hPa than the observed. Recall that these models generate much less cloud liquid water than observed at these levels (Figure 13b). This may reflect differences in the threshold temperature for cloud phases used in the models and the observations. As discussed before, it is hard to evaluate the SCM produced cloud ice since no snow is added (except GFDL). Nevertheless, we notice that GFDL and GISS greatly overestimated the observed cloud ice water and the majority of IWC in GFDL is located much lower than observed. PNNL also shows an unrealistic vertical profile of the cloud ice water, which can extend down to near the surface.
 The pattern of the model temperature biases in period B is similar to that in period A but becomes larger (still less than 4 K for most models) as the strong cloud band passed over the site (Figure 13d). Most models show a cold bias in the levels above 565 hPa and a warm bias below 865 hPa. The exceptions are PNNL and SCAM. PNNL produces a much colder atmosphere in both the upper levels and lower levels than other models and the observations, which might be due to the nonpenetrative convection scheme used in this model. Note that almost 40% of the precipitation rates in this model are from convection. SCAM shows a rather large warm bias below 865 hPa. For the moisture biases (Figure 13e), most models produce a small moist bias in the entire troposphere except for CSU, PNNL, and SCAM, which show a small dry bias in the lower troposphere, and SCCM3_SIO, which a large moist bias below 865 hPa.
Figure 13f shows that all the models overestimate the observed relative humidity in the levels above 465 hPa and most models are near saturation in the middle and upper troposphere. In comparison with Figure 13a, it is seen that most SCM-generated clouds are closely correlated to their relative humidity fields while these two fields are not well correlated in all CRMs. As discussed earlier, this is because cloud fraction in the CRMs is mainly related to their cloud microphysical fields and less dependent on the relative humidity. It is seen that the cloud amount in SCAM sharply reduces below 365 hPa (Figure 13a). This is likely because this model uses a high threshold RH value (>90%) for cloud formation. It is also noteworthy that all the CRMs and most SCMs (except SCAM) produce the relative humidity that is larger than or comparable to the observations in the lower troposphere while all the models substantially underestimate the clouds below 715 hPa. As discussed earlier, this might be related to the specified warm and dry large-scale advective forcing and uncertainties in determining the radar cloud base during the strong frontal precipitation period.
Figures 3e, 3h, and 3k show the ECHAM5, GFDL, and GISS model-produced cloud types for the strong frontal period, respectively. Compared to ISCCP cloud types, all three models capture the high-top optically thick deep convection clouds but the optical depth and the cloud amounts are larger than the observations. The observed middle-top optically thick clouds are almost completely missed in these models. There are also not any low-top clouds in ECHAM5 and GFDL. In contrast, GISS overestimates the low-top clouds in the ISCCP data. It is seen that ECHAM5 diagnosed a large amount of high-top optically medium clouds, which are not shown in the ISCCP data. It should be noted that the later figures only indicate the lack of middle-top clouds in the models. This does not mean that there are no clouds in the middle levels. This suggests that the clouds in the middle levels produced by these models as shown in Figures 7 and 13a have high cloud tops and therefore they are identified as the high-top clouds rather than the middle-top clouds.
4.2.3. Period C: Postfrontal Clouds
 For the postfrontal system, both the satellite and the ARSCL data indicated extensive lower clouds and a considerable amount of middle and high clouds. Figure 14a shows that most models (except SCAM and CSU_SAM) can generally capture the lower frontal clouds while they show rather large differences from the observations in the levels above 815 hPa. The SCAM and CSU_SAM produce much less lower clouds than the observations. It is seen that all the CRMs substantially underestimate the observations above 765 hPa. The model-simulated cloud amounts differ greatly in the levels above 815 hPa among the SCMs. In general, most SCMs produce excessive clouds between 815 hPa and 565 hPa, fewer clouds between 565 hPa and 415 hPa, and excessive clouds above 415 hPa. The SCCM3_SIO model overestimates the cloud amount in the layer between 665 hPa and 365 hPa where most of the other models underestimate the clouds. The CSU SCM significantly overestimates the ARSCL clouds in most of the troposphere except for the lowest level. As indicated earlier, this could be related to the “0” or “1” cloud fraction assumption used in this model.
 Similar to periods A and B, in general, most models underestimate the observed LWC except for the GFDL model, which greatly overestimates the observed value at the levels between 815 hPa and 665 hPa (Figure 14b). Both SCMs and CRMs produce almost no IWC after the frontal system passed over the SGP site (Figure 14c), whereas there is a considerable amount of IWC observed by the cloud radar. For the CRMs, this is consistent with the underestimation of cloud fraction in the middle and high levels in these models (Figure 14a).
 Both the temperature and moisture biases (Figures 14d–14e) are quite large after the frontal passage but the vertical pattern is the same as that in period B. Significantly large differences are seen in the relative humidity field between the observations and the models, and within the model simulations. The errors shown in these large-scale fields might be related to the problems that the models do not capture well the strong frontal system. However, the unrealistic atmospheric state in turn leads to the problems in simulating the postfrontal cloud system by these models.
Figures 3f, 3i, and 3l show the cloud types generated by the ECHAM5, GFDL, and GISS models in the postfrontal system, respectively. The ISCCP clouds (Figure 3c) suggested that there were optically medium and thick clouds between 800 hPa and 180 hPa with the maximum in the lower troposphere. Compared to the ISCCP data, the three models produce more clouds with the optical depth larger than 60 μm while they fail to produce the middle-top optically medium clouds. The GISS model produces more optically thin low-top clouds. The ECHAM5 model correctly produces the optical depth of low-top clouds but the cloud top is lower than the observed. The GFDL model fails to produce optically medium low-top clouds while it overestimates the optically thick lower clouds.
5. Summary and Discussions
 The overall performance of 9 SCMs and 4 CRMs in simulating a strong midlatitude frontal cloud system has been extensively compared with the ARM field measurements during its Spring 2000 cloud IOP. To complement the ARM single station cloud measurements, satellite retrievals have been used to provide the horizontal distribution of the frontal clouds. These models were driven by the large-scale forcing derived from the ARM observations using a variational analysis approach. Almost all the SCMs except SCCM3_SIO used physically based prognostic equations for cloud liquid and ice water contents while they used mixed cloud fraction schemes, including relative humidity based diagnostic cloud schemes, statistical cloud schemes, and prognostic cloud schemes. The cloud fraction in the CRMs is determined by the model-produced cloud water and cloud ice, which are obtained from various bulk cloud microphysics schemes.
 We have shown that both the SCMs and CRMs can typically capture the bulk characteristics of the frontal system, such as the high prefrontal cloud and the low postfrontal cloud. The frontal precipitation is well generated by all the models. However, there are significant differences in detailed structures of the frontal clouds between the observations and the model simulations. For the strong frontal cloud system, nearly all the models produced too many high clouds. All the CRMs substantially underestimated clouds in the middle and low levels while most SCMs overestimated clouds there. For the postfrontal clouds, the CRMs missed the middle and high-level clouds while SCMs produced the middle and high-level clouds that are significantly different from the observations. Many of these model biases could be traced to the lack of subgrid-scale dynamical structure in the applied forcing fields and the lack of organized mesoscale hydrometeor advections. Moreover, the use of 2-D CRMs may partially account for the failure to generate the mesoscale frontal circulations in the CRM simulations since there is no north-south temperature gradient in 2-D CRM simulations. As a result, all the models generated frontal clouds that have a longer lifetime than the observed, and the model clouds were generated a few hours later than those observed.
 The cloud fraction in the CRMs is dependent on the model-generated cloud liquid water and cloud ice water. All of the CRMs significantly underestimated the observed cloud liquid water content during the strong frontal passage and underestimated the observed cloud ice water content in the postfrontal clouds. The first bias is related to the lack of organized mesoscale structure to support stratiform precipitation, and the second bias is related to the lack of ice advection by the cyclonic air behind the cyclone in the models. As a result, these models produced far fewer frontal clouds in the middle and low levels and far fewer postfrontal clouds in middle and high levels than the observations. The cloud water is also greatly underestimated in most SCMs in the strong frontal clouds. However, this bias cannot be easily linked to the error in the SCM-produced cloud fields since the domain mean relative humidity is a dominant factor constraining the cloud formation in most SCMs.
 The dependency of model clouds on the large-scale state fields is different in different cloud parameterizations, even within the relative humidity based cloud fraction schemes. This study has shown that the model cloud fractions can differ greatly among these models even though they produced a similar relative humidity field. This also illustrates the sensitivity of model clouds to the threshold relative humidity in the relative humidity based cloud schemes.
 The SCMs have shown rather large intermodel differences in the simulated clouds and microphysical fields, mainly due to different parameterizations of cloud fractions and cloud microphysics used in these models. It has been shown that several arbitrary parameters, such as the threshold temperature for distinguishing the cloud condensate phase and the threshold relative humidity for cloud formation, used in these schemes can greatly affect model cloud fractions and cloud condensates. Reducing these uncertainties based on available observations would help to reduce the intermodel difference in these cloud fields.
 There is always a concern about the comparison between the model clouds and the single point cloud measurements. Averaging the ARSCL clouds from the 10-s and 45-m time and height intervals onto the 3-hour and 25-hPa intervals improves the representation of clouds in the SGP domain, especially for the highly horizontally advective frontal system. The problem is further reduced in this study by using the satellite retrievals, such as the GOES cloud images, to provide information about the horizontal distribution of clouds so that we can better justify our comparison results. Moreover, the ISCCP satellite cloud type data combined with the ISCCP simulator allow us to perform a more consistent comparison of cloud types and cloud radiative properties between the models and the satellite measurements.
 The ISCCP simulator described in the work of Klein and Jakob  and Webb et al.  was implemented in some SCMs (ECHAM5, GFDL, and GISS) in order to diagnose model clouds that are quantitatively equivalent to the ISCCP retrievals. This study has shown that the model-generated cloud types are significantly different from the ISCCP data, although they all roughly produced the correct cloud amounts. Generally all the three models produced much more high-top optically thick clouds and much less middle-top clouds than the satellite observations for the strong frontal cloud system. For the postfrontal system, they underestimated the low-top optically medium and thick clouds and failed to capture the middle-top optically medium clouds. However, they overestimated the high and middle-top clouds with optical thicknesses larger than 60. The underestimation of middle-top clouds is also found in climate simulations with GCMs [Zhang et al., 2005] and in other studies [e.g., Klein and Jakob, 1999; Ryan et al., 2000; Norris and Weaver, 2001; Tselioudis and Jakob, 2002]. This is a major deficiency of cloud simulations by current climate models. The case study presented in this paper has linked the model problem to the specific frontal process. However, the actual causes for these model biases can only be ascertained from further analysis of individual parameterizations.
 The current large model intercomparison study can be served as a baseline result for further model improvements. More in-depth analysis of the issues could be done in follow-up studies by individual researchers. For example, comparing results from 3-D CRMs and/or high-resolution mesoscale regional models (e.g., C. P. Weaver et al., Dynamical controls on sub-GCM grid-scale cloud variability for ARM case 4, submitted to Journal of Geophysical Research, 2005) could provide valuable insights into the model problems revealed in this study. Sensitivity tests of the SCMs and CRMs driven by forcing data from high-resolution regional models, which can contain the information of subgrid-scale dynamics and the hydrometeor advections, could help to illustrate the importance of these processes for the SCMs and CRMs to correctly capture the frontal cloud systems. In addition, the comparison between the model-produced clouds and the single point cloud measurements could be improved by using a probabilistic approach recently proposed by Jakob et al.  if the model integration period is longer enough to make the evaluation results statistically significant. This probabilistic approach of model evaluation is based on the interpretation of model cloud predictions as probabilistic forecasts at the observation point. They showed that more meaningful model evaluation could be obtained by using the probabilistic approach than traditional methods, such as simply averaging the observations onto the model time interval. These are the subjects of future study.
 This research was supported primarily under the U.S. Department of Energy Atmospheric Radiation Measurement (ARM) Program. We wish to thank James Hack for making the SCAM available to our study. ISCCP data were obtained from NASA LaRC. Work at LLNL was performed under the auspices of the U.S. Department of Energy (DOE) Office of Science, Biological and Environmental Research by the University of California, Lawrence Livermore National Laboratory, under contract W-7405-Eng-48. Work at NASA Langley Research Center was partially supported by the Department of Energy's ARM program, under interagency agreement DE-AI02-02ER63318 (Xu), and by the NASA EOS interdisciplinary study program (Xu, Eitzen). Work at CSU was performed under ARM grant DE-FG03-95ER61968. Work at BNL was support by the ARM Program. Work for CCM3/SIO was supported by ARM grant DE-FG02-03ER63532. Work at PNNL was supported by the ARM Program. PNNL is operated for the DOE by Battelle Memorial Institute under contract DE-AC06-76RL01830. Work at SUNY Stony Brook was supported by ARM grant DE-FG02-98ER62570 and was also supported by NSF under grant ATM9701950. Work at the University of Utah was supported under ARM grant DE-FG03-94ER61769. Work at Scripps was supported in part by ARM grant DE-FG03-97-ER62338, by NOAA under grant NA77RJO453, and by NSF under grant ATM-9613764. Work at Dalhousie University was supported by the MOC2 project jointly funded by NSERC, CFCAS, and MSC. Work at Iowa State University was support by the ARM grant DE-FG02-02ER63483. Work at Goddard Space Flight Center was support by the GMAP Program NASA HQ funding. Work at NASA Goddard Institute for Space Studies was supported by ARM. Work at GFDL was performed under the ARM program from interagency agreement DE-AI02-03ER63562. The Climate Data Analysis Tools (CDAT) that were developed in the Program for Climate Model Diagnosis and Intercomparison (PCMDI) were used to perform our analyses.