Abstract
 Top of page
 Abstract
 1. Introduction
 2. Data
 3. Methodology
 4. Results
 5. Discussion
 6. Summary
 Acknowledgments
 References
 Supporting Information
[1] The development of realistic cloud parameterizations requires accurate characterizations of subgrid distributions of thermodynamic variables. To this end, cloud liquid water content (CLWC) distributions are characterized with respect to cloud phase, cloud type, precipitation occurrence, and geolocation using CloudSat radar measurements. The probability density function (PDF) of CLWC is estimated using maximum likelihood estimation. The bestestimated PDF of CLWC is found to follow either a gamma or a lognormal distribution depending on temperature (cloud phase), cloud type, the occurrence of precipitation, and geolocation. The data sampling with respect to cloud phase and precipitation significantly affects the distributional characteristics of CLWC in some regions. In the lower to midtroposphere (altitudes of 1–6 km) in the tropics and subtropics, where nonprecipitating and pure liquid phase clouds are dominant, the PDFs of CLWC are best described by lognormal distributions. In contrast, at altitudes above 6 km and in regions poleward of the midlatitudes, the CLWC more closely resembles a gamma distribution that coincides with a high frequency of occurrence of supercooled liquid clouds containing low CLWC values. When the contributions of supercooled water and precipitation are removed, the CLWC PDFs transition from gamma to lognormal distributions in two areas: (1) the high altitude and middletopolar latitude regions where the contribution of supercooled cloud is significant and (2) in the lower troposphere where precipitation is frequently detected. Although the CloudSat radar does not sample all cloud hydrometeors, coherent regional and cloud type dependence of CLWC distributional characteristics are observed that may provide useful constraints for cloud parameterizations in climate models.
1. Introduction
 Top of page
 Abstract
 1. Introduction
 2. Data
 3. Methodology
 4. Results
 5. Discussion
 6. Summary
 Acknowledgments
 References
 Supporting Information
[2] At present, a crude treatment of subgrid scale cloud processes in current climate models is widely recognized as a major limitation in predictions of global climate change. Typical climate models have a horizontal resolution on the order of 100 km and a variable vertical resolution between 100 m and 1 km. Since climate models cannot explicitly resolve what happens at the subgrid scales, the physics must be parameterized as a function of the resolved motions. A fundamental problem of cloud parameterization is to characterize the distributions of cloud variables at subgrid scales and to relate the subgrid variations to the resolved flow. In particular, the subgrid distributions of liquid water content play a key role in modern cloud microphysics parameterizations [e.g., Morrison and Gettelman, 2008].
[3] Cloud parameterizations based on probability density functions (PDF) of moist conserved variables (e.g., total water content) have been advocated for some time [e.g., Sommeria and Deardorff, 1977], but only simplified versions have been implemented in weather and climate prediction models [e.g., Tompkins, 2002; Teixeira and Hogan, 2002; Chaboureau and Bechtold, 2002]. PDF approaches have also been used for the development of stochastic parameterizations associated with turbulence, convection, and cloudradiation interaction [e.g., Barker, 2002; Golaz et al., 2002a; Teixeira and Reynolds, 2008].
[4] However, there is no clear consensus on the optimal type and number of basic shapes for the PDF estimation of cloud and thermodynamic properties for different cloud types. For example, aircraft observations [e.g., Larson et al., 2001a] and large eddy simulation models [e.g., Cuijpers and Bechtold, 1995] support that for most stratus cloud regimes a Gaussian PDF for moist conserved variables is a realistic approximation. For cumulus clouds, the skewness of these PDFs plays an important role in determining the cloud and thermodynamic properties so that others have suggested different PDF types such as the beta distribution [Tompkins, 2002], the double Gaussian distribution [Golaz et al., 2002a, 2002b], and the generalized lognormal distribution [Bony and Emanuel, 2001].
[5] The characterization of distributions (PDFs) of cloud variables has been studied using aircraft data [e.g., Ek and Mahrt, 1991; Wood and Field, 2000; Larson et al., 2001b], tethered balloon data [e.g., Price, 2001], satellite data [e.g., Wielicki and Parker, 1994; Barker et al., 1996], and cloudresolving or largeeddy simulation models [e.g., Bougeault, 1981; Lewellen and Yoh, 1993; Xu and Randall, 1996a, 1996b]. However, the observational data used in the previous studies have limitations. Aircraft and tethered balloon data provide only onedimensional paths in a few selected locations and do not provide global coverage. Although satellite data can provide global coverage, previous studies based on infrared (IR) and visible imagery [e.g., Wielicki and Parker, 1994; Barker et al., 1996] do not provide vertically resolved cloud structure information. Recently Kahn and Teixeira [2009] analyzed the scaling behavior of vertically resolved variance of thermodynamic variables from AIRS. However, AIRS is an IR instrument and as such produces retrievals (using some microwave information) of vertical profiles of temperature and water vapor, but not the profiles of cloud properties.
[6] Cloudresolving models (CRMs) can be extremely useful in providing information about the smallscale dynamics because of their high resolution (on the order of 1 km in the horizontal). However, CRMs have their own shortfalls mostly because a significant portion of the cloud dynamics (e.g., turbulence) and most of the cloud physics (e.g., microphysical processes such as precipitation) are still poorly represented [e.g., Marchand et al., 2009; Redelsperger et al., 2000; Bechtold et al., 2000, and references therein].
[7] Recently, NASA's satellite instrument CloudSat has provided vast amounts of unprecedented highresolution information of cloud hydrometeors at climate model subgrid scales that enable characterization of the distributional properties of cloud variables. The CloudSat satellite is in an orbit with the ATrain [Stephens et al., 2002]. CloudSat carries a Wband 94 GHz cloud profiling radar (CPR) providing vertically resolved information of cloud ice and liquid water content, precipitation, cloud classification, radiative fluxes and heating rates [Stephens et al., 2008] at a vertical resolution of 480 m oversampled to 240 m with the footprint of 1.4 km cross track by 2.5 km along track [Mace et al., 2007]. The CPR resolves details in vertical cloud structure that IR sounders or visible imagers are unable to quantify.
[8] In this paper, vertically resolved cloud liquid water content (CLWC) profiles retrieved from CloudSat [Stephens et al., 2008; Austin et al., 2009] are used to characterize the subgrid distribution of CLWC. The CLWC profiles are organized by temperature (an oversimplified, but useful surrogate for cloud phase), cloud type, precipitation occurrence, and geolocation, and the dependence of the CLWC distributions on these parameters is investigated. The characteristics of CLWC distributions are quantified by utilizing a statistical method for the parametric estimation of the probability density function (PDF) [Kay, 1993]. Results for the best estimation of the CLWC PDF with respect to cloud phase, cloud type, precipitation occurrence, latitude, longitude, and height are presented, and physical interpretations of the results are discussed.
2. Data
 Top of page
 Abstract
 1. Introduction
 2. Data
 3. Methodology
 4. Results
 5. Discussion
 6. Summary
 Acknowledgments
 References
 Supporting Information
[9] Ideally, for the characterization of CLWC subgrid distributions one should generate the “snapshot” PDF, where only the variability in space is taken into account in the statistics without introducing the variability in time. The snapshot PDF is an instantaneous state of CLWC distribution in space and is appropriate for the development of cloud parameterizations in climate models. However, CloudSat measurements do not provide a sufficient number of data to generate the snapshot PDF in a typical climate model grid box. The accumulation of CloudSat data over time, which involves a wide range of atmospheric conditions, is unavoidable in order to draw any inferences of the PDFs. As a consequence, the CLWC distribution represented by the accumulated CloudSat data is expected to be broader than the “snapshot” distribution.
[10] The present study uses CloudSat data measured during 4 weeks. A representative set of CLWC data is obtained from the CloudSat 2BCWCRO version R04 data product by collecting data retrieved from measurements on 1–7 January, 1–7 April, 1–7 July, and 1–7 October, which provide weeklong samples for each season. We find that the seasonal variations are insignificant compared to other dependent variables considered here (e.g., cloud type, cloud phase, precipitation, geolocation). Therefore, the weeklong data sets are combined and are used together to investigate the CLWC distributions for the results that follow. Note that the CloudSat 2BCWCRO version R04 data product contains profiles with unsuccessful retrievals of the liquid water content. Only successful retrievals are used in the present analysis.
[11] For the horizontal and vertical distributions, the CLWC data are partitioned by latitude (10° bins), longitude (10° bins), and height (1 or 2 km bins). The CLWC data are also organized and/or filtered in terms of retrieved data quality (associated with cloud phase and precipitation) and cloud type.
[12] One factor that affects the interpretation of CloudSatretrived CLWC is supercooled liquid/mixed phase clouds. CloudSat is unable to independently determine the cloud phase in any given vertical bin. As a result, CloudSat employs a simple scheme to partition the radar measurements into ice and liquid phases. In this scheme, the portion of the profile colder than −20°C is deemed pure ice, the portion of the profile warmer than 0°C is considered pure liquid, and in between is partitioned linearly into ice and liquid phases with a smooth transition from all ice at −20°C to all liquid at 0°C [CloudSat Data Processing Document, 2007a]. The simple linear interpolation does not accurately capture mixed phase cloud structure, which is complex and uncertain [e.g., Nasiri and Kahn, 2008].
[13] Another factor that potentially impacts the fidelity of CloudSatretrieved CLWC is the presence of large droplets [CloudSat Data Processing Document, 2007a]. The radar reflectivity is sensitive to the sixth power of the diameter of the droplet, and the retrieved CLWC in the presence of precipitation often exceeds the applicability of the retrieval algorithm. The CLWC retrieval algorithm assumes a lognormal distribution of droplet sizes, and possible departures from this assumption degrade the accuracy of the retrieval. In the presence of precipitation or drizzle, the cloud droplet distribution departs from the lognormal distribution, violating the assumption of the retrieval algorithm. Therefore, the retrieved CLWC in precipitating clouds is likely to have larger errors than in nonprecipitating clouds.
[14] The CloudSat retrieval assumes a distribution with a single cloud particle mode. When both precipitation and cloud water are present in the same volume at the same time, this assumption is not valid. Furthermore, when the observed reflectivity is larger than the range allowed by the a priori data (a frequent occurrence when large raindrops are present) or results in water contents with very large attenuation not matched by observations, the retrieval will not converge. Therefore, radar profiles with moderate to heavy rainfall are usually nonconvergent. However, drizzle and light rain can increase the reflectivity but the retrieved CLWC values are not too large that they exceed the range allowed by the a priori assumptions. As a result, the CloudSat retrievals of CLWC are likely biased high by light rain and drizzle and are not completely filtered out by the precipitation flag.
[15] Considering the impacts of cloud phase and precipitation on the accuracy of retrieved CLWC, the data are organized in four ways:
[16] 1. All cloud set (AC): includes all successfully retrieved CLWC data.
[17] 2. Nonprecipitating cloud set (NP): includes only nonprecipitating CLWC data.
[18] 3. Liquidphase cloud set (LP): includes only pure liquid phase CLWC data (T > 0°C).
[19] 4. Nonprecipitating and pure liquid phase cloud set (NP + LP): includes only nonprecipitating and pure liquid phase CLWC data.
[20] In defining the NP data set, we use the “Precipitation Flag” given in the CloudSat 2BCLDCLASS version R04 data product. The Precipitation Flag is determined by checking the maximum reflectivity and attenuation of surface signals due to precipitation [CloudSat Data Processing Document, 2007b]. The accuracy of this precipitation detection method is limited by the effect of surface return signal and vertical resolution. NP consists of data with the precipitation flag of 00.
[21] To produce the LP data set, we use the “Temperature” given in the CloudSat ECMWFAUX version R04 data product [CloudSat Data Processing Document, 2007c]. The CloudSat retrieval process uses this temperature to decompose the liquid and ice water contents in the mixed phase clouds. LP consists of data with temperature >0°C.
[22] CLWC data in this study are also partitioned by cloud type using the cloud classification data reported in CloudSat 2BCLDCLASS version R04 data product. CloudSat classifies clouds into altostratus (As), altocumulus (Ac), nimbostratus (Ns), stratus (St), stratocumulus (Sc), cumulus (Cu), deep convective (Cb), or high cirrus and cirrostratus (Ci) cloud by using characteristics of hydrometeor vertical and horizontal scales, radar reflectivity, precipitation, and ancillary data including temperature profiles and surface topography height [Sassen and Wang, 2008]. The present study considers six cloud types: As, Ac, Ns, Sc, Cu, and Cb but omits all Ci occurrences as they are likely to be a result of cloud misclassification or a byproduct of the linear cloud phase assignment. Stratus clouds are also excluded due to its very low occurrence frequency (<0.1% in all cloudclassified data), which leads to insufficient sampling.
[23] Sassen and Wang [2008] demonstrated that the CloudSat classification is generally consistent with previous global cloud type distributions but with some differences that are due to limitations of the CloudSat measurements. The main limitation is that the radar is insensitive to clouds containing relatively small particles and the lowest three or four radar bins above the surface (<1 km) are contaminated by surface returns [CloudSat Data Processing Document, 2007b]. Therefore, small fair weather cumulus, altocumulus, and cold cirrus clouds are likely to be underrepresented in the CloudSat data [Sassen and Wang, 2008]. Furthermore, the rulebased classification is sensitive to the selection of the thresholds, which can lead to frequent misclassifications for cases near the thresholds.
[24] The present analysis uses only converging profiles of CloudSat data, removing a considerable fraction of the cloud profiles that do not converge to a solution. For the data period we considered, the percentage of profiles retained is 72% out of all the cloud profiles. In particular, fewer samples are converged for heavily precipitating clouds such as Cu, Ns, and Cb compared to nonprecipitating or lightly precipitating clouds such as As, Ac, and Sc. As a result, the relative occurrence frequencies of cloud types in the converged profiles are different from those in cloudclassified profiles as shown in Table 1. Therefore, the CloudSat retrieved CLWC is likely to underrepresent (overrepresent) the contribution of the heavily precipitating (non or lightly precipitating) clouds to the “true” CLWC distribution. The precipitating/nonprecipitating portion of each cloud type also differs in some degree as listed in Table 1.
Table 1. Relative Occurrence Frequency (ROF) of Cloud Type Among Seven Cloud Types Considered and Configurations of Precipitating (P) and Nonprecipitating (NP) Cases of Each Cloud Type for CloudSat CloudClassified Data and for CloudSat CLWC Converged Data^{a}Cloud Type  CloudSat CloudClassified Data  CloudSat CLWC Converged Data  Retrieval Success Rate (%) 

ROF (%)  P (%)  NP (%)  ROF (%)  P (%)  NP (%) 


As  27  0.1  99.9  38  0.1  99.9  83 
Ac  6  6.4  93.6  8  4.3  95.7  75 
Sc  13  40.5  59.5  19  33.1  66.9  85 
Cu  3  59.1  40.9  2  67.8  32.2  33 
Ns  27  70.8  29.2  27  63.2  36.8  58 
Cb  10  68.2  31.8  6  63.9  36.1  37 
3. Methodology
 Top of page
 Abstract
 1. Introduction
 2. Data
 3. Methodology
 4. Results
 5. Discussion
 6. Summary
 Acknowledgments
 References
 Supporting Information
[25] The properties of CLWC PDFs retrieved from CloudSat are quantified using a statistical parametric approach called maximum likelihood estimation (MLE) [Kay, 1993]. MLE finds the best parameter set for an assumed PDF functional form, which maximizes the probability (likelihood) to generate a given data set with the PDF function. MLE makes the maximum use of the information in a data set, is statistically robust against noise, and provides the lowest possible variance of parameter estimates as a data sample size increases.
[26] In MLE, the likelihood L of sampling a data set {x_{i}} for an assumed PDF f(x) is given by
This likelihood L is maximized with respect to the parameters of the PDF f(x). For example, if f(x) is a Gaussian function, the parameters are the mean and the variance of the Gaussian function. Numerically, the PDF parameters that maximize the likelihood L are found using Newton's method iteratively [Press et al., 1988].
[27] Various functional forms including Gaussian, beta, Weibull, exponential, lognormal, and gamma distribution functions are investigated in order to determine an appropriate function for CLWC. The candidate functions are selected based on (1) the scope of the shapes that the function can generate and (2) the simplicity and compatibility of the functional form to the climate model parameterizations. Since the number of parameters in the functional form directly determines the complexity of the climate model parameterization, we limit our selections to one or twoparameter functions and exclude any mixture distributions. Using a simple and single distribution is a first step toward understanding the structure of CLWC distribution.
[28] Among the selected distribution functions, it is found that the gamma and lognormal distribution functions are the most appropriate for the variety of CLWC data sets. Figure 1 illustrates the qualitative assessments of the goodness of fit with the selected functional forms that we tested for two typical cases. Figure 1a demonstrates the case for which the gamma distribution is the best function for the CLWC distribution in latitude [40°N, 50°N], longitude [130°E, 140°E], and height of 1–3 km. Similarly, Figure 1b shows the case for which the lognormal distribution is the best distribution function for the CLWC distribution in latitude [10°N, 20°N], longitude [120°E, 130°E], and height of 1–3 km.
[29] We also quantify the extent to which a given functional form represents a data distribution by comparing the maximum likelihood obtained with MLE for each functional form. The theoretical upper limit of the likelihood one can achieve for a given data set is
where h(x_{i}) is the frequency (i.e., number of occurrences) of CLWC value x_{i}, N is the number of CLWC data samples, and M is the number of different CLWC values in the distribution. The maximum likelihood of a given functional form is given by equation (1) with the best fit functional parameters.
[30] By comparing the maximum likelihood of each functional form with the theoretical upper limit of the likelihood, we devise a quantitative measure of goodness of fit as follows,
where f is a given functional form to fit the distribution. The quantity Q captures the relative distance of the goodness of fit from that of the theoretical upper limit. The smaller Q is, the better the function is fit to the data distribution. Table 2 shows Q values for all the tested functions with the CLWC distributions shown in Figure 1. These numbers confirm the qualitative assessments taken from Figure 1. Note that the beta distribution is also a good fit for the CLWC distribution of Figure 1a, although the gamma distribution is slightly better than the beta distribution.
Table 2. Quantitative Assessment of the Goodness of Fit With Selected Distribution Functions for CLWC Distributions in Region A of Latitude [40°N, 50°N], Longitude [130°E, 140°E], and Height [1 km, 3 km] and in Region B of Latitude [10°N, 20°N], Longitude [120°E, 130°E], and Height [1 km, 3 km]^{a}  Region A  Region B 


Gamma  0.0067  0.0184 
Lognormal  0.0240  0.0125 
Exponential  0.0489  0.0549 
Gaussian  0.0961  0.0480 
Weibull  0.0088  0.0269 
Beta  0.0069  0.0192 
Uniform  0.5628  0.4611 
[31] The Gamma distribution function is characterized by two parameters, α and β,
Parameter α determines the shape of the gamma distribution function, while parameter β determines the scale of the function. Similarly, the lognormal distribution function is characterized by two parameters, μ and σ,
Parameters μ, and σ are the mean and standard deviation of the variable's natural logarithm.
[32] The main difference between the gamma and lognormal distributions appears at small values of the variable where the gamma function exhibits a polynomial dependence but the lognormal function exhibits an exponential dependence. Both distribution functions are fit to the distribution of CLWC, and the maximum likelihood values of the two distributions are compared to determine which distribution quantitatively fits CLWC better.
[33] The characterization of a CLWC distribution requires the number of samples in a data set of interest to be sufficiently large enough to make a statistically meaningful analysis. When the number of data is smaller than 1000, we notice that the distribution shape is not prominent enough to determine the best fit distribution function. Therefore when the number of samples in a data set of interest (e.g., data in a given latitude, longitude, and height grid box) is smaller than 1000, we did not perform the MLE estimation of the distribution and masked out the grid box in figures.
4. Results
 Top of page
 Abstract
 1. Introduction
 2. Data
 3. Methodology
 4. Results
 5. Discussion
 6. Summary
 Acknowledgments
 References
 Supporting Information
[34] The dependence of the CLWC distributions on cloud type is investigated with the four data sampling conditions (AC, NP, LP, and NP + LP) as described before. Figure 2 shows the CLWC distribution sets for the six cloud types (As, Ac, Ns, Sc, Cu, and Cb), along with the best fit gamma or lognormal distributions calculated using MLE. All the CLWC distributions shown in this paper correspond to data histograms normalized by the total number of data.
[35] For the AC and NP data sets, the CLWC distributions follow a gamma distribution. In contrast, the LP and NP + LP data sets follow a lognormal distribution. The transition of the PDF shape from the gamma to the lognormal distribution hints of issues with the mixed phase retrieval algorithm component. The parameters of the best fit gamma distribution for AC and NP and those of the best fit lognormal distribution for LP and NP + LP are listed in Table 3. In order to give a physical sense on the scale of the parameters, the corresponding mean and standard deviations of the distributions are also listed in units of mg/m^{3}. The physical scale mean (μ_{physical}) and standard deviation (σ_{physical}) are related with the gamma parameters (α, β) and lognormal parameters (μ, σ) as follows,
The differences between the AC and NP data sets are much smaller than differences between the AC and LP data sets. The similarity between AC and NP distributions is partially explained by the CloudSat retrieval success rate, which is much higher for nonprecipitating clouds such as As, Ac, and Sc than for precipitating clouds such as Ns and Cb (see Table 1). Therefore, nonprecipitating clouds are overrepresented in the AC data set, thus making the AC and NP distributions look similar. The small differences between AC and NP appear mainly in the high value range of CLWC. In particular for boundary layer clouds (Sc and Cu) the NP condition reduces the frequency of occurrence of large retrieved values of CLWC as would be expected since we are filtering out the precipitating events.
Table 3. Parameters of the Best Fit Gamma (α, β) and Lognormal ( μ, σ) Distributions for AC, NP, LP, and NP + LP Data Sets of the Cloud Types (As, Ac, Sc, Cu, Ns, Cb) Considered in This Study^{a}Cloud  AC  NP  LP  NP+LP 

α  β  Mean  Std  α  β  Mean  Std  μ  σ  Mean  Std  μ  σ  Mean  Std 


As  1.0  77  77  77  1.0  76  76  76  5.0  0.63  181  126  5.0  0.63  181  126 
Ac  1.5  93  123  114  1.5  86  129  105  5.2  0.59  216  139  5.1  0.58  194  123 
Sc  2.0  86  172  122  2.1  78  164  113  5.6  0.65  334  242  5.0  0.39  160  65 
Cu  1.8  149  268  200  2.1  89  187  129  5.5  0.67  306  230  5.2  0.57  213  132 
Ns  1.0  126  126  126  1.2  94  113  103  5.6  0.58  320  202  5.4  0.53  255  145 
Cb  1.5  162  243  198  1.9  67  127  92  5.1  0.60  196  129  5.1  0.56  192  116 
[36] On the other hand, limiting CLWC to T > 0°C (from AC to LP data set) causes a dramatic change from the gamma distribution to the lognormal distribution by considerably reducing most occurrences of CLWC <10–30 mg m^{−3}. Figure 2 also shows that the larger differences between cloud types for small values of CLWC apparent in the AC and NP data sets are reduced significantly after the LP condition is applied. Since the inclusion or exclusion of retrieved phase for the cloud water between −30°C and 0°C dramatically alters the characteristics of the CLWC distribution, more precise assessments of mixed phase cloud in the future are necessary, which requires advancements in measurement technologies, retrieval algorithms, and/or multisensor research.
[37] One counterintuitive feature shown in Figure 2 is that Cu has larger CLWC than Cb regardless of data sampling condition. This is mostly explained by the CloudSat retrieval success statistics shown in Table 1. The CloudSat retrieval process changes the relative occurrence of precipitating and nonprecipitating portions of these clouds. The CloudSat retrieved CLWC distributions of Cb underrepresent the precipitating portion (due to more retrieval failures in the precipitating portion than in the nonprecipitating portion), and this diminishes the occurrence of high CLWC in the Cb distribution. In contrast, the retrieved CLWC distributions of Cu overrepresent the precipitating portion, increasing the frequency of high CLWC values.
[38] The dependence of the CLWC distribution on height and latitude is investigated by applying the aforementioned PDF estimation method with the gamma and lognormal distributions. Figure 3 shows the difference of the logarithm of the likelihood between the best fit gamma and lognormal distributions. The difference of the logarithm of the likelihood is given by
where G(x) is the best fit gamma distribution function and LN(x) is the best fit lognormal distribution function. The set {x_{i}} is the CloudSat CLWC data. Positive (negative) values mean that the gamma (lognormal) distribution provides a higher likelihood value than the lognormal (gamma) distribution for a given data set.
[39] For the AC data set, the middle and lower troposphere in the tropical and subtropical regions are better described by a lognormal distribution except near the surface, while the middle and high latitudes are better described by a gamma distribution. The NP data set does not change the distribution likelihood structure very much, except in the boundary layer of the tropics and subtropics. Note, however, that the CloudSat retrievals are more uncertain or nonexistent in the first km above the surface because of surface clutter.
[40] The LP data set alters the distribution likelihood structure significantly. Most of CLWC data at higher altitudes and near polar latitudes are removed because the observations corresponding to T < 0°C are being filtered out. The remaining CLWC data exhibits a lognormal distribution, except in the boundary layer below 1 km. When NP + LP is considered, the distribution of CLWC is generally close to a lognormal distribution almost everywhere.
[41] The lowtomiddle altitude regions in the tropical and subtropical latitudes, where the cloud is mainly in a pure liquid phase and is nonprecipitating, exhibit essentially no change of the lognormal distribution with the data sampling conditions regardless of its geolocation and cloud type. Table 4 summarizes the characterizations of the PDFs of CLWC with respect to the datafiltering scheme and geolocation.
Table 4. Characterizations of PDFs of CloudSatRetrieved CLWC With Respect to Data Sampling Scheme, Described in Section 2, and Geolocation  Boundary Layer  Middle to Lower Troposphere of Tropics and Subtropics  Middle to Upper Troposphere of MiddletoPolar Latitude Regions 

AC  Gamma  Lognormal  Gamma 
NP  Lognormal  Lognormal  Gamma 
LP  Gamma  Lognormal  N/A 
NP + LP  Lognormal  Lognormal  N/A 
[42] Variations of the distribution function type due to data sampling conditions in an illustrative grid box is shown in Figure 4. The distributions of two data sets are compared: AC and NP + LP for a grid box located at 3–4 km and 40°N–50°N (the box is highlighted in Figure 3). Figure 4 clearly illustrates how removing clouds with precipitation and supercooled liquid droplets affects the shape of the CLWC distribution. In particular, the frequency of CLWC < 10–30 mg m^{−3} is significantly reduced, making the observed distribution follow a lognormal distribution more closely than a gamma distribution. This occurs due to the temperature sampling (T < 0°C is filtered out) and confirms the results shown in Figure 3.
[43] Another interesting effect is the relative increase of the frequency of occurrence of larger values of CLWC for the NP + LP sampling. This is due to the fact that PDFs (normalized histograms) are shown and as such this apparent increase of the larger values of CLWC is simply a consequence of the significant reduction in the frequency of occurrence of lower values.
[44] The detailed height dependence of the CLWC PDFs in the subtropics (20°N–30°N) for two different data sampling conditions (AC and NP + LP) is shown in Figure 5. For AC, the distribution gradually transitions from a lognormal to a gamma distribution as the height increases, which is attributed to the larger prevalence of small values of CLWC at higher altitudes due to the colder temperatures. In addition, for the AC set the mean value of the CLWC PDFs clearly decreases with the increase of height as is consistent with a concomitant decrease of temperature with height (and as such a decrease of saturation specific humidity and cloud water content). For NP + LP conditions, all values of CLWC above 6 km are removed and the remaining data exhibits a lognormal distribution.
[45] In Figure 5b, the decreased occurrence frequency of large values in the lowest layer in NP + LP is due to the significant removal of events associated with precipitation. It is clear from these results that many of the boundary layer clouds identified by CloudSat are precipitating. This does not imply that boundary layer clouds such as stratus or stratocumulus are precipitating more frequently and/or more intensely than deeper clouds such as shallow or deep cumulus. On one hand, precipitation in deeper clouds causes retrieval failure and as such will not appear in the statistics being discussed. On the other hand, for boundary layer clouds, the CloudSat retrieval algorithm produces a converged value for CLWC frequently even in the presence of light precipitation (see Table 1 for statistics).
[46] The latitudeheight distribution of zonal mean CLWC PDFs is illustrated in Figure 6. The parameters of the best fit gamma distribution for the AC set show a noticeable correlation with height and latitude. The parameter α (the shape parameter) is correlated with both latitude and height such that it has a higher value in regions with higher temperature. In contrast, the parameter β (the scale parameter) seems to be generally correlated with height and regime type (note the slightly larger values in the NH and SH subtropics). While the correlations between the PDFs and geolocation parameters are apparent, the correlations with height may be an artifact of the constraint given by an a priori vertical profile used in the CloudSat retrieval algorithm. The underlying physical causes of the correlations are not obvious and require additional work. It is also important to note that the gamma distribution is not a good fit for the middle to lower tropical and subtropical troposphere for the AC data set (the lognormal distribution is a better fit); thus, the representation of CLWC distributions with these parameters is not as robust as in other regimes.
[47] Figures 6e and 6f also display the dependence of the PDF parameters for the best fit lognormal distributions of the NP + LP set. Unlike the gamma distribution parameters of the AC set, the lognormal parameters show a less coherent correlation with geolocation but instead indicate regional variations. Figure 6e confirms the finding of Li et al. [2008] where it is shown that, for nonprecipitating situations, CloudSat produces larger mean values of liquid water in the midlatitude lower troposphere. An interesting minimum in μ and σ between 1 and 3 km (close to the top or above the boundary layer in most regions) hints at the lack of nonprecipitating clouds with large values of CLWC.
[48] Figure 6f shows larger variability of CLWC in the midtropospheric tropics and subtropics (between 3 and 6 km) presumably associated with transient cloudiness from convection and/or frontal events. In contrast, CLWC has less variability between 1 and 2 km in the tropics and subtropics presumably associated with the steady state nature of the cloudy boundary layer in these regions and the general dryness of the regions just above the boundary layer. Obvious from this discussion is the need to disentangle the temporal from the spatial variability of CLWC, which is difficult (or impossible) to do correctly (as mentioned earlier) owing to the sampling nature of the CloudSat observations.
[49] In order to facilitate the physical interpretation and direct comparison of the CLWC distributions, the mean and standard deviations corresponding to each distribution are also plotted in Figures 6c and 6d for the AC data set and in Figures 6g and 6h for the NP + LP data set. One noticeable difference between the two data sets is a considerable reduction of the mean and standard deviation in the boundary layer (height between 0 and 2 km) from AC to NP + LP. This reduction is attributed to the noprecipitation (NP) sampling condition, which removes highvalue CLWC samples in precipitating clouds.
[50] Figure 7 illustrates global distributions of CLWC PDF characteristics between 1 and 3 km. The likelihood difference between gamma and lognormal distributions (Figures 7a and 7f) shows that in the lower troposphere the lognormal distribution is a better fit in most regions except high latitudes. The best fit lognormal parameters μ and σ are plotted in Figures 7b and 7c (Figures 7g and 7h) for the AC (NP + LP) data set. Figure 7b shows larger values of μ in the stratocumulus and tradecumulus regions off the subtropical west coast of continents, implying larger values of CLWC. The fact that this behavior is not observed in Figure 7g suggests many of these clouds are precipitating.
[51] Figure 7c shows no latitude or regional dependence of σ in the oceanic tropics and subtropics. However, Figure 7h shows a reduction of σ in the stratocumulus regions when precipitating clouds are eliminated. Overall, the variability of the lognormal distribution parameters is reduced when precipitation and ice cloud contributions are filtered out. In addition to the lognormal distribution parameters, the corresponding mean and standard deviation of the CLWC distribution in unit of mg/m^{3} are plotted for physical interpretation in Figures 7d and 7e for the AC set and in Figures 7i and 7j for the NP + LP set.
5. Discussion
 Top of page
 Abstract
 1. Introduction
 2. Data
 3. Methodology
 4. Results
 5. Discussion
 6. Summary
 Acknowledgments
 References
 Supporting Information
[52] All the data used in the present work have uncertainties and biases associated with the CloudSat retrieval process and CloudSat radar measurement sensitivity. It is important to quantify the impact of the uncertainties and biases on the CLWC distribution analysis.
[53] The CloudSat retrieval process uses an optimalestimation approach [Rodgers, 2000] in which an a priori vertical profile serves as a constraint on the retrieval together with an a priori covariance matrix representing the uncertainty of the profile. The retrieved solution is obtained by minimizing a cost function that is a weighted sum of the difference between the measurement and forward model vectors and the difference between the retrieved state vector and the a priori vector. This approach either makes the retrieved solution biased toward the a priori vector or leads to unsuccessful retrievals (i.e., solutions not converged) if the state vector to be retrieved is dissimilar from the a priori vector. Which a priori vector to be used for a given radar measurement vector is determined by classifying cloud type, which has its own uncertainties (i.e., misclassification). Therefore, the resulting distributions of the retrieved CLWC may be more narrow and biased toward the a priori vector than the “true” CLWC distribution.
[54] The retrieved CLWC in precipitating clouds has larger uncertainties than in nonprecipitating clouds because the radar reflectivity is highly sensitive to large droplets and its droplet size distribution deviates from the assumed lognormal distribution. We addressed this uncertainty by filtering the data with precipitation (NP data set) and find that the CLWC distribution does not change its characteristics dramatically (Figures 2a and 2b). The favored distribution (either gamma or lognormal) remained the same before and after the NP filtering. However, this similarity is partly attributed to the high retrieval failure rate of precipitating clouds, leading to the underrepresentation of precipitating clouds in the AC set. Note that the CloudSat warm/liquid phase cloud retrievals will not contain low CLWC values whether or not they actually exist if the droplets are sufficiently small.
[55] Another major source of the CloudSat data uncertainty is the treatment of mixed phase cloud in the CloudSat retrieval process, where the air temperature is used to partition the cloud into the liquid and ice phases. The mixed phase cloud contributes to a large frequency of the low CLWC values as seen in the data distribution of the AC data set (Figure 2a) in comparison with the LP data set (Figure 2c). When the ice portion of the cloud is removed (LP data set), all CLWC distributions are lognormal as seen in Figure 3c in contrast to the gamma distributions present in Figure 3a (AC data set). The present study shows that misidentification of cloud phase could impact the characteristic structure of CLWC distributions.
[56] The low values of CLWC, a critical part of distinguishing the PDF shape between gamma and lognormal distributions, is highly uncertain because of the relative insensitivity of the radar reflectivity measurement to small droplets in addition to the ambiguity of mixed phase clouds. In order to assess the uncertainties of low CLWC values, the PDF is reestimated in three separate tests by excluding CLWC values lower than 10, 20, and 30 mg m^{−3}. For the CLWC distribution shown in Figure 1a, the gamma distribution is initially the best fit function and remains the best fit function when excluding CLWC <10 mg m^{−3}. However, when data with CLWC <20 or 30 mg m^{−3} are removed, it becomes ambiguous to determine whether CLWC follows a gamma or lognormal distribution (not shown). This experiment illustrates the importance of low CLWC values in characterizing the shape of the CLWC PDF.
[57] Finally, we test to what extent the reported uncertainty of the CLWC values affect the PDF estimation. The CloudSat 2BCWCRO version R04 data product provides a measure of the uncertainty in the retrieved CLWC, which is the diagonal element of the error covariance matrix. The uncertainty is affected by both the uncertainty in the measured radar reflectivity values and the uncertainty in a priori data for the retrieval process. We use the uncertainty as the standard deviation of Gaussian noise in the data. A retrieved CLWC value x_{i} with uncertainty of σ_{i} can be expressed as the sum of the unknown “true” value 〈x_{i}〉 and the Gaussian noise N(0, σ_{i}^{2})
Given this distribution, the probability p(x_{i}) of getting the retrieved value x_{i} is now the joint probability of getting the true value 〈x_{i}〉 from the assumed PDF function f(x) and getting x_{i} from the Gaussian noise around the true value〈x_{i}〉,
Since the “true” value 〈x_{i}〉 is unknown, it is approximated with the retrieved value x_{i} as a firstorder estimation. Then the joint probability becomes
Consequently, the likelihood of getting the data series of {x_{i}} is given by
This likelihood gives a higher (lower) weight for data with a lower (higher) uncertainty. Maximizing the likelihood with respect to the function parameters gives the best fit function for a given functional form, which takes into account the uncertainty associated with the data.
[58] Figure 8 shows this likelihood maximization and demonstrates that the best fit functional shape (either gamma or lognormal) essentially remains unchanged and the best fit parameters are within 15% away from those obtained with the original likelihood estimation given by equation (1). In this particular case, the best fit lognormal parameters for the two PDFs are μ = 5.24 and σ = 0.57 without uncertainty and μ = 5.22 and σ = 0.63 with uncertainty. Most of the results presented in the paper are obtained without taking into account this uncertainty. This experiment suggests that the main conclusions drawn in this paper are largely valid even though the uncertainty is not considered explicitly.
6. Summary
 Top of page
 Abstract
 1. Introduction
 2. Data
 3. Methodology
 4. Results
 5. Discussion
 6. Summary
 Acknowledgments
 References
 Supporting Information
[59] We characterized the distributions of CloudSatretrieved cloud liquid water content (CLWC) data sampled during 2007 using maximum likelihood estimation (MLE). The best estimate PDFs of CLWC are found to closely follow either a gamma or a lognormal distribution depending on cloud phase, cloud type, the occurrence of precipitation, and geolocation. In the tropical and subtropical latitudes between 1 and 6 km, where nonprecipitating and pure liquid phase clouds are dominant, the PDFs of CLWC are best described by a lognormal distribution. In contrast, at altitudes above 6 km and regions poleward of the midlatitudes that contain a high frequency of supercooled liquid cloud droplets, a gamma distribution best explains the CLWC distributions primarily due to an increased occurrence of low values.
[60] The data sampling with respect to cloud phase and precipitation significantly affects the distribution characteristics of CLWC in some regions. After removing the contributions of supercooled water and precipitation, the CLWC distribution transitions from a gamma to a lognormal distribution in (1) high altitudes and midlatitudetopolar regions where the contribution of supercooled or mixed phase clouds is significant and (2) in the lower troposphere where the precipitating cloud is frequent and CloudSat has significant observational limitations.
[61] Some parameters of the CLWC distributions show a noticeable relationship with latitude and height. For the all cloud (AC) data set, the shape parameter (α) of the gamma distribution is correlated with latitude and height such that it is positively correlated with air temperature, while the scale parameter (β) is correlated primarily with height. For the no precipitation, liquid phase (NP + LP) data set, the lognormal distribution parameters show a less coherent relationship with latitude and height but do show some regional structure. The mean parameter (μ) of the lognormal distribution exhibits peak values in the stratocumulus and trade wind cumulus regimes. The variability parameter (σ) is reduced in the same regimes and in the midtroposphere.
[62] In the global distribution of the lognormal parameters in the boundary layer (1–3 km) with the AC set, the stratocumulus regions off the subtropical west coast of continents show the largest mean parameters (μ). The oceanic tropics and subtropics show no clear regional or latitude dependence in the variability parameter (σ). When precipitation and supercooled droplets are removed (NP + LP), the highest mean values (μ) of CLWC are found in the midlatitudes between 1 and 3 km and the variability of CLWC (σ) is strongly diminished in the stratocumulus and trade cumulus regions.
[63] Detailed information about PDFs of vertically resolved liquid water content from satellite observations such as those obtained from CloudSat can offer useful quantitative constraints for the development of parameterizations of cloud microphysics that take into account the effects of subgrid variability. However, the results of this work also show that the uncertainties, biases, and insufficient sampling of hydrometeors by the CloudSat radar lead to inherent uncertainties in the quantification of CLWC PDFs. Another pressing issue is to disentangle the temporal from the spatial variability in addition to quantifying observational and retrieval uncertainties. Studies that use combinations of sensors, e.g., radar, lidar, passive visible, infrared, and microwave, may offer more precise estimates of PDFs of cloud liquid water content profiles.