Cluster analysis of cloud regimes and characteristic dynamics of midlatitude synoptic systems in observations and a model



[1] Global climate models typically do not correctly simulate cloudiness associated with midlatitude synoptic systems because coarse grid spacing prevents them from resolving dynamics occurring at smaller scales and there exist no adequate parameterizations for the effects of these subgrid-scale dynamics. Comparison of modeled and observed cloud properties averaged over similar regimes (e.g., compositing) aids the diagnosis of simulation errors and identification of meteorological forcing responsible for producing particular cloud conditions. This study uses a k-means clustering algorithm to objectively classify satellite cloud scenes into distinct regimes based on grid box mean cloud fraction, cloud reflectivity, and cloud top pressure. The spatial domain is the densely instrumented southern Great Plains site of the Atmospheric Radiation Measurement Program, and the time period is the cool season months (November–March) of 1999–2001. As a complement to the satellite retrievals of cloud properties, lidar and cloud radar data are analyzed to examine the vertical structure of the cloud layers. Meteorological data from the constraint variational analysis is averaged for each cluster to provide insight on the large-scale dynamics and advective tendencies coincident with specific cloud types. Meteorological conditions associated with high and low subgrid spatial variability are also investigated for each cluster. Cloud outputs from a single-column model version of the GFDL AM2 atmospheric model forced with meteorological boundary conditions derived from observations and a numerical weather prediction model were compared to observations for each cluster in order to determine the accuracy with which the model reproduces attributes of specific cloud regimes.

1. Introduction

[2] Clouds remain the primary source of uncertainty in the modeling of current climate and predictions of future climate [Intergovernmental Panel on Climate Change (IPCC), 2001]. Many previous studies evaluated global climate model (GCM) reliability by comparing large spatial and temporal averages of simulated and observed cloud properties, such as zonal and seasonal means [e.g., Weare and AMIP Modeling Groups, 1996]. One shortcoming of this approach is that it can obscure the presence of compensating errors. For example, Norris and Weaver [2001] demonstrated that overprediction of shortwave cloud radiative forcing under conditions of ascent largely balanced underprediction of shortwave cloud radiative forcing under conditions of descent such that a realistic midlatitude ocean radiation climatology occurred in the National Center for Atmospheric Research (NCAR) Community Climate Model version 3 for the wrong reasons. Moreover, comparisons of long-term averages do not provide much information about reasons for correct or incorrect simulation of cloud properties.

[3] A complementary method of model cloud evaluation is compositing, i.e., averaging cloud properties grouped by similar parameters such as 500-mbar vertical velocity or sea level pressure [e.g., Bony et al., 1997; Klein and Jakob, 1999; Norris and Weaver, 2001; Tselioudis and Jakob, 2002; Jakob, 2003; Bony et al., 2004; Lin and Zhang, 2004]. This technique typically focuses on the connection between cloud properties and the dynamical processes that affect them. There are, however, several disadvantages associated with compositing on dynamical parameters. One is a lack of reliable data for atmospheric variables important for cloud formation (temperature, humidity, vertical velocity) over much of the globe. Moreover, as was pointed out by Jakob and Tselioudis [2003], compositing by dynamical parameter requires prior knowledge about the meteorological processes associated with the particular cloud regimes to be identified.

[4] Jakob and Tselioudis [2003] applied a clustering algorithm to International Satellite Cloud Climatology Project (ISCCP) [Rossow and Schiffer, 1999] histograms of cloud optical depth and cloud top pressure to identify dominant modes of cloud variability in the tropical western Pacific. This study uses a similar procedure to determine typical cloud regimes associated with extratropical cyclones for the ISCCP grid box centered on the southern Great Plains (SGP) site of the Atmospheric Radiation Measurement (ARM) Program [Ackerman and Stokes, 2003]. Only the cool season (November–March) is examined because that is the time of year when clouds in this region are primarily produced by large-scale synoptic systems. Although our study focuses on clouds over Oklahoma and Kansas due to the dense network of ARM measurements there, the use of globally available ISCCP data allows us to assess the extent to which our results can be generalized to other regions of the world. Lidar and cloud radar retrievals from the ARM SGP site provide information about the vertical distribution of cloudiness that is not available from satellite observations. Other ARM measurements have been used to constrain the water and energy budgets for reanalyzed output from a numerical weather prediction model over approximately a 3.5° × 3.5° area, enabling relatively accurate determination of large-scale horizontal and vertical advection of temperature and moisture in a volume approximately the size of a GCM grid column. This product, the Constraint Variational Analysis (CVA) [Zhang et al., 2001; Xie et al., 2004], is superior to analyses from numerical weather prediction models but is currently available only over the SGP site during January 1999 to March 2001. Meteorological parameters coincident with the satellite cloud scenes are averaged within cloud clusters to gain a better understanding of the dynamics coincident with various cloud types.

[5] Previous studies [e.g., Klein and Jakob, 1999; Norris and Weaver, 2001; Tselioudis and Jakob, 2002; Lin and Zhang, 2004] have demonstrated that GCMs have difficulty correctly simulating clouds associated with extratropical cyclones in large part because subgrid-scale processes and especially subgrid-scale vertical motion are not adequately represented [Katzfey and Ryan, 2000; Ryan et al., 2000]. One common GCM problem is production of frontal clouds that are too thick and too horizontally uniform, presumably due to lack of subgrid vertical motions that in the real world thicken clouds in one part of the grid box and dry out clouds in another part. The first step to improving GCMs is determining which cloud regimes are most problematic in models, and this study accomplishes that by comparing observed cloud properties averaged within each cloud cluster with those from the single-column model (SCM) version of the Geophysical Fluid Dynamics Laboratory (GFDL) Atmospheric Model (AM2) [GFDL Global Atmospheric Development Team, 2004]. Since the SCM receives realistic large-scale advective tendencies from the CVA, observed and simulated cloud properties should be largely the same if the GFDL AM2 correctly parameterizes subgrid processes. A second goal of this study is the identification of large-scale meteorological conditions that are associated with small or large subgrid spatial variability in frontal clouds, which will be useful for later parameterization. This is investigated by dividing cloud clusters according to instantaneous spatial variability in cloud reflectivity and cloud top pressure in each ISCCP scene and examining the differences in CVA meteorology and advective forcing for cases of high and low cloud variability. The CVA data, however, do not provide information about mesoscale and smaller-scale dynamical processes directly responsible for generating subgrid cloud variability. These must instead be investigated using a high-resolution model, as reported by Weaver et al. [2005].

2. Clustering of Cloud Data

[6] The primary source of cloud observations for this investigation was the ISCCP D1 equal-area (280 km × 280 km) data set, originally processed from radiances measured by geostationary weather satellites [Rossow et al., 1996; Rossow and Schiffer, 1999]. We examined the single grid box (35–37.5°N, 99.3–96.2°W) most closely collocated with the ARM SGP/Cloud and Radiation Testbed (CART), centered on 36.6°N, 97.5°W. The entire ISCCP grid box is located inside the boundary facilities of the SGP/CART site and the CVA domain. ISCCP data provide grid box mean cloud fraction, cloud top pressure and visible cloud optical thickness every three hours during daytime, usually averaged from 50–80 pixels about 4–7 km in size and spaced approximately 30 km apart. Other available data are the spatial standard deviations of pixel cloud top pressure and cloud optical thickness within the grid box and the relative frequencies of pixels occurring in seven cloud top pressure and six cloud optical thickness intervals (42 categories). We restricted our analysis to solar zenith angles less than 72° because cloud property retrievals may be inaccurate when the Sun is close to the horizon. The restriction of our analysis to daytime cases should not be significant due to the small diurnal variability in high clouds over the central United States in wintertime [Wylie and Woolf, 2002]. Because reflected radiation flux varies nonlinearly with cloud optical thickness, the unweighted average of cloud optical thickness values does not correspond to the average reflected radiation flux. We took this issue into account by converting cloud optical thickness values to cloud reflectivity at 0.6 microns using an ISCCP look-up table (corresponding to Figure 3.13 of Rossow et al. [1996]) before averaging. The liquid water conversion table was used for all clouds since cloud ice fraction was not readily available, but the resulting error is smaller than that from averaging cloud optical thickness instead of cloud reflectivity.

[7] ISCCP 3-hourly data for the months of November–March were grouped into cloud regimes by applying a k-means clustering algorithm to gridbox mean cloud fraction, cloud reflectivity, and cloud top pressure. The k-means procedure classifies all data elements into a specified number of clusters such that within-cluster variance is minimized [Hartigan, 1975]. The only arbitrary parameter needed is the number of clusters; the character of the individual cluster means is then objectively determined by the data. For reasons described in the following paragraph, we chose to calculate six clusters. Values of cloud fraction, cloud reflectivity, and cloud top pressure were all converted to a scale varying linearly from 0 to 1 to ensure each parameter would contribute equally to clustering, and times with no cloudiness at all, approximately 5% of the total, were excluded from clustering. The clustering process began with random selection of six data elements as initial seeds, each element comprising a 3-hourly mean cloud fraction, cloud reflectivity, and cloud top pressure. Every other element in the data set was then assigned to the initial seed it was closest to in a Euclidean sense. The number of elements in a cluster divided by the total number of elements is the frequency of occurrence of the cluster, and the average of all elements in the cluster is the centroid. These cluster centroids became new seeds to reinitialize the clustering routine, which was repeated until the centroids converged.

[8] The most subjective aspect of the k-means method is specifying the number of clusters. After examining results for various numbers, we chose to use six because that was the minimum number of clusters that had clearly distinct cloud properties and meteorological conditions. Additional clusters overlapped preceding clusters without providing appreciable new information. For example, two of the six original clusters were patchy thin cirrus and extensive thicker cirrostratus, and a seventh cluster was merely cirrus of intermediate optical thickness and horizontal coverage. Inclusion of such intermediate clusters would increase the length of the paper and the number of plots without commensurately enhancing our understanding of dynamical and thermodynamical conditions associated with particular cloud types. Another uncertainty in the k-means method is the convergence of the clustering algorithm to different results for different initial seeds. We resolved this ambiguity by clustering on 100 different sets of random initial seeds and choosing the final cluster set with the least sum of variance around each cluster centroid. Only two alternate realizations occurred, and these were substantially similar to the minimum variance cluster set. The differences entailed the addition of a cirrus-type cluster and the combination of optically thick low-top clouds with optically thick high-top clouds in a single cluster rather than placing them in separate clusters.

[9] Our approach differs from that of Jakob and Tselioudis [2003] in that we cluster on three parameters (gridbox mean cloud fraction, cloud top pressure, and cloud reflectivity) rather than on 42 parameters (cloud fraction within each of seven cloud top pressure and six cloud optical thickness intervals). We chose to use a three-parameter grid box mean phase space because it is simpler and can be applied to GCMs that do not produce ISCCP-like output (which has nonnegligible computational and storage costs). One disadvantage of aggregation of the 42 parameters to three gridbox mean parameters is loss of information when pixels in the grid box have widely varying cloud properties (e.g., bimodal distributions), but examination of instantaneous ISCCP scenes indicates that unimodal distributions occur 71% of the time in our domain. We define a scene as unimodal if the gridbox mean cloud top pressure and optical thickness fall into a cloud top pressure/optical thickness interval that is the same or adjacent to the cloud top pressure/optical thickness interval with the most pixels. One disadvantage of clustering on 42 parameters is that each parameter is treated as being equally distant from the others, and adjacent cloud top pressure/optical thickness intervals are grouped together only if they co-occur in instantaneous ISCCP scenes. We found that clustering on 42 parameters did not produce results that were any more dynamically distinct than clustering on three parameters. Moreover, the 42-parameter method converged to a larger number of solutions for different starting seeds than did the three-parameter method.

[10] Although the 1999–2001 time period is of greatest interest because that is when advective forcing from the CVA is available, the clustering algorithm was applied to 14 years (1988–2001) of the ISCCP data for January–March and November–December to reduce sampling uncertainties. Clusters for January 1999 to March 2001 were determined by matching elements to the nearest 14-year centroid, with no iterative reclustering. The 1999–2001 centroids were nearly identical to those for the entire 14 years, but the cluster frequencies were slightly different. In order to match the temporal resolution of the CVA, January 1999 to March 2001 ISCCP data were linearly interpolated from 3-hourly to 1-hourly before the application of clustering.

[11] Table 1 lists cluster centroids for the 1999–2001 time period, ordered according to relative frequency. Conditions with completely clear sky occurred 6% of the time (in the 3-hourly data) and were not clustered. Although, for convenience, we label each cluster with a cloud name, this does not imply that the name is characteristic of every element in the cluster. Pixel frequency distributions as a function of cloud optical thickness and cloud top pressure are plotted for each cluster in Figure 1. The variability in cloud optical thickness and cloud top pressure seen in the histograms results from both the subgrid variability of cloud properties within individual scenes and the variability of the gridbox mean elements around the cluster centroids. The first and second clusters, “extensive cirrus” and “patchy cirrus,” are optically thin with high cloud tops, but have very different cloud fractions. Clusters 3 and 4 (“frontal/nimbostratus” and “stratus/stratocumulus,” respectively) are optically thick, with nearly 100% cloud cover. Although the frontal/Ns reaches into the upper troposphere, St/Sc has the lowest cloud top of any cluster. The fifth cluster is a mixture of clouds at multiple levels in the atmosphere with nearly 100% cloud cover. Examination of individual scenes indicated that clouds occur at a variety of levels at the same time, so the histogram in Figure 1 is not merely an artifact of averaging. This mixed cluster does not separate into high-cloud and low-cloud clusters when more than six clusters are calculated, suggesting it is indeed a distinct regime. Cluster 6, “cumulus/cirrus,” is characterized by a combination of low-level and high-level optically thin clouds, and in this case the gridbox mean cloud top pressure is not representative of individual pixels.

Figure 1.

Frequencies of ISCCP pixels in cloud top pressure and cloud optical thickness intervals for each cluster. The dashed lines indicate the nine ISCCP standard cloud categories.

Table 1. ISCCP Mean Cloud Properties ±1 Standard Deviation for Each Cluster During January–March of 1999–2001 and November–December of 1999–2000
Cloud TypeClusterMean Cloud FractionMean Cloud Top Pressure, mbarMean ReflectivityOptical ThicknessFrequency, %
Extensive cirrus10.82 ± 0.14209 ± 740.11 ± 0.060.919
Patchy cirrus20.24 ± 0.15190 ± 980.07 ± 0.050.517
Frontal/nimbostratus31.0 ± 0.03431 ± 1160.68 ± 0.1120.916
Stratus/stratocumulus40.92 ± 0.11707 ± 860.58 ± 0.1313.313
Mixed50.90 ± 0.13432 ± 1030.31 ± 0.073.712
Cumulus/cirrus60.33 ± 0.18657 ± 1550.28 ± 0.103.36
Clear     6

[12] While satellite observations describe well the horizontal distribution of cloud properties, they supply much less information about the profile of overlapping clouds. Knowledge of the vertical cloud distribution can be obtained from the millimeter cloud radar (MMCR), micropulse lidar, and ceilometer instruments located at the ARM SGP Central Facility. The measurements from these instruments have been combined in the Active Remote Sensing of Cloud Layers (ARSCL) data product [Clothiaux et al., 2000], which identifies the presence of cloud at 45 m vertical resolution and provides the top and bottom heights of every cloud layer directly above the instruments. ARSCL 10-s data at 45 m resolution were averaged within 250 m vertical intervals and 30-min periods centered on the hourly time points of the CVA to obtain cloud fraction as a function of height. We assume that the frequency of cloudiness above the instrument during a 30-min interval is identical to the spatial cloud fraction within the surrounding local area (18–36 km for 10–20 m s−1 advection speeds). This is much smaller than the ISCCP domain (280 km), but averaging over longer time intervals can mix in substantial temporal variability in the cloud field [Kim et al., 2005, Appendix A]. Averages of the highest cloud top, lowest cloud base, and integrated thickness of all layers were also calculated. We determined lowest cloud base according to ceilometer measurements (30-s sampling and 8 m vertical resolution) since they do not misidentify precipitation as cloud like the MMCR. This adjustment was not undertaken for the profiles of cloud fraction at each level, which may lead to a slight overestimate of cloud fraction, especially near the surface. At least 50% of the data in a 30-min interval were required to have valid retrievals in order to construct an average (16% of the data were missing from the entire data set). The ARSCL 30-min values were assigned to the same clusters that the coincident ISCCP data were classified in, and then averaged. Although cloud fractions reported by ISCCP and ARSCL may exhibit large disagreement at any given time due to differences in spatial sampling, this effect is random and will be reduced by averaging over many time points in a cluster. Systematic disagreement due to different methods of observation will remain.

[13] Table 2 shows mean ARSCL cloud fraction, highest cloud top, lowest cloud bottom, and integrated cloud thickness of all layers. The cloud fraction listed in Table 2 is determined by the occurrence of cloud at any level. Cluster cloud top heights reported by ARSCL correspond well with those reported by ISCCP. Less agreement occurs for cloud fraction, although the relative variations between clusters are similar. One reason ISCCP might report larger cloud fraction than ARSCL is that pixels are designated as completely cloudy even if the actual cloud is smaller than the pixel size. Another possible reason is that MMCR fails to detect clouds otherwise seen by laser at heights above 8 km about 10–20% of the time [Clothiaux et al., 2000]. Physical cloud thickness is not directly comparable to optical cloud thickness since the latter also depends on the condensate concentration, effective particle size, and water phase. Figure 2 displays average cloud fraction at every level for each cluster. Clouds in the cirrus regimes occur almost entirely above 5 km whereas St/Sc clouds are generally confined to the lowest 3 km. Frontal/Ns cloudiness is horizontally extensive and exists in a deep layer, consistent with having the largest physical and optical thickness of any cluster. The ARSCL profiles for the multilayer clusters (Cu/Ci and mixed) exhibit nonnegligible cloud occurrence over a wide range of vertical levels.

Figure 2.

Profiles of ARSCL mean cloud fraction for each ISCCP cluster. The horizontal lines indicate boundaries of ISCCP cloud top pressure intervals.

Table 2. ARSCL Mean Cloud Properties ±1 Standard Deviation for the Same Times as the ISCCP Clusters
Cloud TypeClusterCloud FractionCloud Top Height, kmCloud Base Height, kmIntegrated Cloud Thickness, km
Extensive cirrus10.48 ± 0.669.4 ± 5.05.8 ± 3.93.1 ± 2.7
Patchy cirrus20.13 ± 0.439.3 ± 5.16.0 ± 3.23.0 ± 2.1
Frontal/nimbostratus30.87 ± 0.728.9 ± 3.81.4 ± 1.75.6 ± 3.0
Stratus/stratocumulus40.80 ± 0.863.6 ± 2.91.2 ± 1.71.3 ± 1.1
Mixed50.70 ± 0.817.5 ± 4.23.6 ± 2.72.9 ± 2.3
Cumulus/cirrus60.70 ± 0.815.0 ± 3.22.7 ± 2.31.7 ± 1.4

3. Characteristic Dynamics

[14] Dynamical parameters associated with ISCCP 1-hourly interpolated data were obtained from the Constraint Variational Analysis (CVA) [Zhang et al., 2001; Xie et al., 2004], a single-column analysis carried out for a domain approximately the size of a GCM grid box centered on the ARM SGP Central Facility. The CVA constrains numerical weather prediction (NWP) model output with atmospheric soundings and measurements of precipitation, surface energy fluxes, and top-of-atmosphere energy fluxes. Column-integrated mass, water, energy and momentum are conserved by the application of objective analysis techniques. The resulting product provides vertical profiles of atmospheric conditions (horizontal winds, vertical motion, temperature, relative humidity) as well as the tendencies of temperature and water vapor due to large-scale horizontal and vertical advection. SCMs and Cloud System Resolving Models produce more realistic cloud and precipitation simulations when they are forced by advective tendencies from the CVA rather than from the original NWP output [Xie et al., 2003]. CVA data are currently available at 25 mbar spacing between 1000 mbar and 100 mbar for every hour during January 1999 to March 2001.

[15] To provide insight into the atmospheric state and advective forcing associated with the various cloud regimes, we averaged vertical profiles of CVA data over the times corresponding to each cluster. Monthly means were removed from relative humidity (RH) and temperature values prior to averaging to prevent the large basic state decline in RH and temperature with height from dominating the plots. Similarly, advective tendencies of water vapor mixing ratio were divided by the saturation mixing ratio at each level, thus converting them to tendencies in RH under the assumption that temperature remains constant. For consistency, all values of RH and saturation are with respect to liquid water even though saturation with respect to ice may be more applicable in the upper troposphere. We calculated 95% confidence intervals for the cluster means assuming a normal distribution and counting successive hours classified into the same cluster as a single realization. The smallest effective sample size for any cluster is 75 (frontal/Ns). The total number of ISCCP interpolated hourly data contributing to the clusters is 1790. Vertical profiles of perturbation RH are displayed in Figure 3, pressure vertical velocity is displayed in Figure 4, and advective tendencies of water vapor mixing ratio are displayed in Figure 5.

Figure 3.

Profiles of mean CVA perturbation RH with respect to water (%) for each cluster. The dashed lines indicate 95% confidence intervals.

Figure 4.

As in Figure 3, but for pressure vertical velocity (mbar/h). Negative values correspond to upward motion.

Figure 5.

Profiles of mean CVA horizontal (thick dashed line), vertical (thick dot-dashed line), and total (thick solid line) advection of water vapor mixing ratio for each cluster. Advection values have been normalized by saturation mixing ratio at each level (units are percent of saturation/h). The thin solid lines indicate 95% confidence intervals for total advection.

[16] The mean cloud properties of each cluster are physically consistent with the dynamical forcing. Upper tropospheric RH is higher than normal for the Extensive Ci regime (cluster 1) due to upward vertical motion near the 350 mbar level that is increasing the water vapor mixing ratio over time. This positive total advection of water vapor does not occur in the patchy Ci regime (cluster 2), which instead experiences stronger downward motion and drier conditions than extensive Ci. The frontal/Ns regime (cluster 3) is associated with very strong upward motion that is rapidly increasing water vapor mixing ratio by vertical advection and producing a large positive RH perturbation from surface to tropopause. Despite the occurrence of mean ascent throughout the troposphere in the St/Sc regime (cluster 4), a positive anomaly in RH occurs only below the 600 mbar level. Negative horizontal advection of water vapor overwhelms the positive vertical advection to cause net drying above the low-level clouds. Weak upward motion in the mixed cloud regime (cluster 5) produces small positive RH anomalies in the middle and upper troposphere through vertical advection of water vapor. Negative horizontal advection of water vapor dominates in the Cu/Ci regime (cluster 6), and the troposphere is anomalously dry except near the surface and tropopause. Despite the similarities between advective forcing and cloud properties described above, it is important to keep in mind that the mean meteorological conditions may not be characteristic of every element in the cluster.

4. Subgrid Spatial Cloud Variability

[17] Previous studies have found that GCMs have difficulty correctly representing subgrid variability in cloud properties [e.g., Norris and Weaver, 2001; Tselioudis and Jakob, 2002]. This is particularly the case for frontal cloudiness, which typically is too uniformly high and optically thick under conditions of strong ascent. Norris and Weaver [2001] attributed this to the lack of representation of subgrid vertical motions in current GCM parameterizations, since even if grid box mean vertical motion were upward, subgrid variability could result in stronger ascent in one portion of a grid box and weak descent in another portion of the grid box. Because GCMs currently do not consider subgrid variability in vertical motions aside from moist convective parameterizations, grid box mean ascent tends to produce spatially uniform saturation of the entire grid column. Although the role of subgrid variability in vertical motion is difficult to investigate observationally due to lack of reliable data, high-resolution simulations of two synoptic systems passing over the SGP site indeed indicate a strong connection between the mesoscale distribution of upward motion and the mesoscale distribution of cloudiness [Weaver et al., 2005]. Identification of the large-scale forcing associated with mesoscale variability in vertical motion and cloudiness will aid parameterization of these effects. Since computationally intensive simulations are available only for a few short time periods, it is useful to examine in observations how the large-scale meteorological forcing differs between cases of high and low subgrid cloud variability with the same grid box mean cloud properties.

[18] We carried this out by partitioning clusters into subsets based on the spatial standard deviations of cloud reflectivity and cloud top pressure of each element. The “high variability” subset of a cluster then consists of those elements whose standard deviations are above the median values of both parameters for the cluster, and the “low variability” subset consists of those elements whose standard deviations are below the median values. It happens to be the case that the spatial standard deviations of cloud reflectivity are dominated by the gridbox means because the standard deviation must be close to zero when the mean value is close to zero. For this reason we divide cloud reflectivity standard deviations by the gridbox means before partitioning the cluster. The high-variability and low-variability subsets each have slightly more than one quarter of the elements since subgrid variability in cloud reflectivity tends to be positively correlated with subgrid variability in cloud top pressure.

[19] Figure 6 shows horizontal, vertical, and total water vapor advection for high-variability and low-variability subsets of the frontal/Ns regime (cluster 3). Strong ascent produces positive vertical and net positive total water vapor advection for both subsets. The horizontal water vapor advection, however, is negative for high-variability cases and positive for low-variability cases. The presence of positive vertical and negative horizontal water vapor advection also occurs with the high-variability subsets of the St/Sc (cluster 4) and mixed (cluster 5) cloud regimes (not shown). These results suggest that substantial subgrid variability in cloud top pressure and cloud reflectivity may result from subgrid variability in vertical motion that saturates only part of a grid box otherwise being dried by horizontal advection, although subgrid variability in horizontal moisture advection may also play a role. Contrastingly, horizontal moistening favors much more uniform saturation and cloud properties.

Figure 6.

Profiles of mean CVA total, horizontal, and vertical water vapor advection (percent of saturation/h) for subsets of high (thin line) and low (thick line) subgrid cloud variability in cluster 3 (frontal/Ns). The dashed lines indicate 95% confidence intervals.

[20] Subsets of high and low subgrid cloud variability are sometimes associated with completely different meteorological conditions. Figure 7 shows this is the case for the St/Sc regime (cluster 4). The low-variability subset resembles cold sector stratocumulus. These clouds occur beneath subsidence that caps a shallow boundary layer and constrains cloud top height and thickness to be relatively uniform. The high-variability subset resembles warm sector stratus. Examination of individual days suggests plumes of moist ascending air advected from the subtropics form stratus clouds of varying heights and thicknesses. High- and low-variability subsets of the extensive cirrus regime (cluster 1) also occur in different meteorological regimes (Figure 8). The low-variability subset is associated with subsidence, a cold troposphere, and a depressed tropopause, presumably in the upper level trough following the passage of a cold front. The high-variability subset is associated with ascent, a warm troposphere, and an elevated tropopause, presumably in an upper level ridge ahead of an approaching cyclone.

Figure 7.

As in Figure 6, except for pressure vertical velocity (mbar/h), temperature (K), and perturbation RH (%) in cluster 4 (St/Sc).

Figure 8.

As in Figure 7, except for cluster 1 (extensive Ci).

5. Global Representativeness

[21] The global representativeness of the results at the SGP site can be assessed by measuring the proximity of locally generated cluster centroids to the SGP centroids. We did so by calculating clusters for 1999–2001 data in each ISCCP grid box using the 14-year (1989–2001) SGP centroids as initial seeds with no iterative reclustering. The average Euclidian distance between these centroids and the SGP centroids was computed for each grid box with weighting by cluster frequency. The resulting values then describe how well cloud regimes around the world resemble cloud regimes at the SGP site. To provide insight into the relative importance of differences from the 14-year SGP centroid, we scaled distances from all grid boxes by the distance between the 1999–2001 SGP centroid and the 14-year centroid. Thus a scaled distance equal to one means clusters at an arbitrary grid box are as close to the 14-year SGP centroid as clusters calculated from three of those 14 years. Table 3 lists 14-year centroids and frequencies at the SGP site, and a comparison with Table 1 demonstrates that differences between centroids calculated over 1989–2001 and centroids calculated over 1999–2001 are small. Figure 9 shows that cloud regimes over many midlatitude land regions and especially over the eastern half of the United States have similar properties to those at the SGP site. This suggests the atmospheric state and advective forcing documented for each SGP cluster are broadly representative of midlatitude continental cool season cloudiness.

Figure 9.

Distances of 1999–2001 centroids at each grid box from the 1989–2001 centroids at the SGP grid box, scaled by the 1999–2001 SGP value. Contour interval is 0.5; values less than 1.0 are dark gray and values greater than 2.0 are white. A white square marks the SGP site.

Table 3. ISCCP Mean Cloud Properties for Each Cluster During January–March of 1988–2001 and November–December of 1988–2000
Cloud TypeClusterMean Cloud FractionMean Cloud Top Pressure, mbarMean ReflectivityFrequency, %
Extensive cirrus10.852270.1120
Patchy cirrus20.232180.0718
Clear    2

6. Model Cloud Comparison

[22] One difficulty with evaluating the quality of GCM cloud simulation is that it is not always clear whether errors in cloud simulation result from incorrect large-scale forcing or an incorrect response of parameterizations to correct large-scale forcing. This problem is mitigated in the examination of 3-hourly SCM cloud output from runs with CVA forcing, which presumably experienced similar forcing as the observed clouds. In this study we will examine output from an SCM implementing the full physics parameterizations, vertical resolution, and time step of the GFDL AM2 [GFDL Global Atmospheric Model Development Team, 2004], but our procedure could be applied to any model. The GFDL SCM has 24 vertical levels and a prognostic cloud scheme based upon the work by Tiedtke [1993] with stratiform microphysics from Rotstayn [1997] and Rotstayn et al. [2000]. The cumulus parameterization is relaxed Arakawa-Schubert [Moorthi and Suarez, 1992], and the turbulence parameterization is based upon work by Lock et al. [2000]. Temperature and moisture forcings specified from the CVA without nudging and the winds specified from observations drive the model. The data analyzed in this study come from hours 12 to 36 of 36-hour SCM forecasts that begin every day with the observed sounding. This minimizes the drifts of temperature and moisture that can develop in such SCM simulations [Ghan et al., 2000].

[23] As was done for the ISCCP data, the 3-hourly SCM cloud data were linearly interpolated to 1 hourly. Cluster by cluster comparison of observed and simulated cloud properties will establish the specific cloud regimes that are well or poorly modeled. For direct comparability with the satellite data, the SCM output had been converted into frequency distributions of “pixels” with various values of cloud optical thickness and cloud top pressure using the ISCCP simulator (, as described by Klein and Jakob [1999] and Webb et al. [2001]). The ISCCP simulator divided the SCM column into 50 subcolumns and randomly assigned cloud cover or clear to each level of each subcolumn such that the model overlap assumption was maintained and the fraction of subcolumns with cloud was the same as the SCM column cloud fraction at each level. The total optical thickness and cloud top pressure of each subcolumn were then calculated from the vertical distribution of condensate amount and effective particle size in a manner consistent with ISCCP retrievals of cloud properties. Each subcolumn was treated as a pixel, and the fraction of pixels in each ISCCP cloud top pressure/optical thickness intervals was calculated. Subcolumn cloud optical thickness was sometimes less than that detectable by satellite, defined in this study as optical thickness less than 0.3. These subvisible cloudy pixels were not included in calculations of mean grid box cloud fraction, cloud optical thickness, and cloud top pressure. Conditions of solely subvisible cloudiness in the SCM column occurred 10% of the time, and completely cloudless conditions occurred 18% of the time (in the 3-hourly data).

[24] Table 4 lists average SCM cloud properties, and Figure 10 presents SCM pixel frequency distributions for the same times classified into the ISCCP clusters during 1999–2000. Although SCM data were not available for January–March 2001, this has little impact on the overall results since differences between simulated and observed clouds are much larger than differences between averages over 1999–2000 and 1999–2001. Comparison with Table 1 and Figure 1 shows that model clouds are much more optically thick than observed clouds. This is true even if the subvisible clouds are included in the optical thickness average. SCM cloud fraction, however, is generally less than average ISCCP cloud fraction due to the frequent occurrence of completely clear sky or subvisible cloudiness in the SCM at times when clouds were actually observed. Overprediction of cloud optical thickness and underprediction of cloud fraction are common compensating errors in GCMs. Clusters 3, 4, and 5 in Figure 10 also exhibit the typical GCM behavior of producing clouds that are too optically thick and too high in the atmosphere under conditions of grid box mean ascent (Figure 4) [Norris and Weaver, 2001; Tselioudis and Jakob, 2002; Lin and Zhang, 2004; Xie et al., 2005; Xu et al., 2005; Zhang et al., 2005]. The St/Sc cloud regime (cluster 4), which exhibits substantial cloudiness in the middle and upper troposphere despite net advective drying at these levels (Figure 5), has the most egregious error. Since the Tiedtke [1993] cloud parameterization does not have a way to treat the impacts of horizontal advection differently from the impacts of vertical advection, the SCM always generates cloud water and cloud fraction whenever grid box mean ascent occurs and grid box mean RH is greater than 80%. Although substantial variability in cloud properties is evident in the histograms displayed in Figure 10, more variability results from temporal changes in the grid box mean and less from spatial variability within the grid box than is the case for ISCCP. Examination of individual scenes indicated that all cloud pixels occur in a single cloud top pressure interval and single cloud optical thickness interval three times as often in the SCM than in ISCCP.

Figure 10.

Frequencies of SCM pixels in cloud top pressure and cloud optical thickness intervals for each ISCCP cluster. The dashed lines indicate the nine ISCCP standard cloud categories and a special category of subvisible cloudiness (optical thickness < 0.3).

Table 4. SCM Mean Cloud Properties for the Same Times as the ISCCP Cloud Clusters
Cloud TypeClusterMean Cloud FractionMean Cloud Top Pressure, mbarMean Reflectivity
Extensive cirrus10.46 ± 0.45291 ± 2730.27 ± 0.23
Patchy cirrus20.28 ± 0.40327 ± 2880.29 ± 0.24
Frontal/nimbostratus30.90 ± 0.26371 ± 1860.74 ± 0.24
Stratus/stratocumulus40.84 ± 0.32579 ± 2390.73 ± 0.21
Mixed50.72 ± 0.40388 ± 2570.48 ± 0.29
Cumulus/cirrus60.45 ± 0.44461 ± 3330.41 ± 0.31

[25] Model cloudiness might differ from coincident observed cloudiness if there were a delay in the SCM response to observed forcing or if the CVA did not correctly represent the observed forcing. To account for this possibility, we carried out a statistical evaluation of the SCM distribution of cloud properties. This was accomplished by classifying the SCM elements into their own clusters by performing the clustering routine on the grid box mean SCM cloud properties. Table 5 lists the resulting SCM cluster centroids, and Figure 11 displays pixel frequency distributions as a function of cloud optical thickness and cloud top pressure for each cluster. Note that the SCM clusters have a different order and frequency than the ISCCP clusters. Patchy Ci and Cu/Ci regimes occur much less frequently in the SCM than in ISCCP, presumably because the SCM too often produces clear sky instead of partial cirrus cloudiness. This underestimation still exists even if subvisible cloudiness is included in the clusters. Although the greater frequency of completely clear sky in the SCM causes the relative frequency of frontal/Ns and St/Sc regimes to increase, they still occur with approximately the same absolute frequency as in the ISCCP clusters. Although SCM cluster cloud fractions are comparable to the ISCCP clusters, the optical thickness values are much larger. Figure 11 indicates that most of the SCM clusters have less variability in cloud optical thickness and/or cloud top pressure than do the ISCCP clusters, consistent with the general tendency for lesser subgrid spatial variability in individual SCM scenes.

Figure 11.

As in Figure 9, but for cluster generated from an independent clustering of SCM mean properties.

Table 5. Mean Cloud Properties for Clusters Derived From the SCM Output
Cloud TypeClusterMean Cloud Fraction, %Mean Cloud Top Pressure, mbarMean ReflectivityFrequency, %
Extensive cirrus10.88 ± 0.14148 ± 750.17 ± 0.0814
Patchy cirrus20.23 ± 0.13105 ± 860.08 ± 0.068
Frontal/nimbostratus31.0 ± 0.02369 ± 1220.87 ± 0.0718
Stratus/stratocumulus40.98 ± 0.06795 ± 960.69 ± 0.1716
Mixed50.93 ± 0.12368 ± 1200.45 ± 0.1210
Cumulus/cirrus60.37 ± 0.21780 ± 1100.61 ± 0.184
Clear/subvisible    30

7. Summary and Discussion

[26] This study demonstrates how satellite observations of midlatitude cool season continental cloudiness can be grouped into distinct cloud regimes by application of a k-means clustering algorithm to grid box mean cloud fraction, cloud reflectivity, and cloud top pressure. These regimes correspond to typical cloud types associated with various synoptic conditions over land during winter: extensive cirrus, patchy cirrus, frontal/nimbostratus, stratus/stratocumulus, cumulus/cirrus, and a mixture of clouds at a variety of levels. Averages of ground-based retrievals of cloud fraction, cloud height, and cloud thickness are consistent with the satellite cloud distributions and provide additional insight into the vertical structure of the cloud regimes. Consistency is also found between cloud properties of each regime and vertical profiles of meteorological parameters averaged over a domain approximately the size of a GCM grid box and constrained by a dense network of observations to conserve column-integrated mass, water, energy, and momentum. In particular, a close relationship is found between mean vertical profiles of water vapor advection, relative humidity, and cloudiness for each regime. We investigated cloud properties over the ARM SGP site since it was one of the few locations on Earth with accurate observations of water vapor advection, but our general results should be applicable to many midlatitude land regions around the globe.

[27] A primary motivation for this study is the diagnosis of errors in GCM simulations of specific cloud regimes. To this end we carried out an analogous classification of cloud output from the SCM version of the GFDL AM2 model and compared properties of the resulting cloud regimes with those of the observations. Since the SCM was forced with realistic large-scale boundary conditions, simulated cloudiness should be similar to coincident observed cloudiness if the model correctly parameterizes subgrid processes. An alternative diagnostic method is to calculate cloud regimes separately for the SCM output and the observations and compare the mean properties of the cloud regimes. Both methods indicate cloud optical thickness is too large and completely clear sky is too frequent in the model, as is the case for other GCMs and SCMs [e.g., Norris and Weaver, 2001; Tselioudis and Jakob, 2002; Lin and Zhang, 2004; Xie et al., 2005; Xu et al., 2005; Zhang et al., 2005]. The SCM appears to reproduce the correct absolute frequencies of frontal/Ns and St/Sc regimes, albeit with clouds that are too bright, but other cloud regimes that include optically thin cirrus are underproduced.

[28] Another feature of the simulated clouds is their lack, relative to the observations, of subgrid spatial variability in cloud optical thickness and cloud top pressure, especially for the frontal/Ns regime. The GFDL SCM and other models tend to saturate the entire grid column under conditions of grid box mean ascent, thus producing a uniform and very optically thick cloud. Possible reasons for this were explored using observed cloud and meteorological data. Division of the observed frontal/Ns regime into subsets of high and low subgrid spatial cloud variability indicates high subgrid variability is associated with negative horizontal water vapor advection whereas low subgrid variability is associated with positive horizontal advection. In both subsets, vertical water vapor advection is positive, a result of large-scale ascent. This finding suggests that models do not sufficiently represent the effects of horizontal advection of dry air in a vertically moistened column. If the vertical moistening is nonuniform, due to subgrid variability in vertical velocity, then substantial subgrid variability in cloudiness results. Weaver et al. [2005] showed, using high-resolution simulations, that storms over this site indeed exhibited substantial mesoscale (and smaller-scale) variability in vertical motion and that this variability was strongly correlated with similar variability in clouds. More research must be carried out to identify the large-scale meteorological forcing and subgrid processes responsible for producing various observed cloud regimes and how these can be better represented in GCMs.


[29] This work was supported by DOE Atmospheric Radiation Measurement Program grant 63332-1018730-0007407. The authors appreciate useful comments from two anonymous reviewers.