[5] The earliest prelaunch versions of the CloudSat ice retrieval followed the algorithm described by *Benedetti et al.* [2003], modified to use radar alone (with no optical depth input). These represented the ice cloud particles as a distribution of ice spheres of fixed density whose size was modeled using a three-parameter modified gamma distribution. In the radar-only (RO) retrieval case, the state vector (containing the unknowns to be retrieved) was composed of an array of characteristic diameter values (one of the three size distribution parameters) corresponding to the cloudy bins of the measured radar profile. Thus for a measurement vector containing *p* cloudy bins, the retrieval would solve for *p* values of the characteristic diameter. The other two size distribution parameters (the particle number concentration and the distribution width parameter) were assigned fixed values and uncertainties (based on climatology, field data, or other criteria); these forward model parameters were constrained to be height-invariant. Once the elements of the state vector were determined, values of typical microphysical parameters such as effective radius and ice water content (IWC) were easily calculated in terms of the size distribution parameters. The remaining inputs to the retrieval consisted of an a priori vector and covariance matrix, corresponding to the best knowledge of the elements of the state vector before the measurement is made. These were determined in a manner similar to the forward model parameters. An augmented retrieval using a combination of radar plus visible optical depth (RVOD) in the measurements vector was constructed similarly. (This is in fact the case described by *Benedetti et al.* [2003].) This augmented version added the (height-invariant) particle number concentration to the state vector, allowing this parameter to be retrieved rather than prescribed.

#### 2.1. Motivation for Improvements

[6] While these early retrieval versions showed promise and were straightforward to implement, there were some difficulties in their use. (Parallel difficulties were found in the early CloudSat liquid cloud retrieval, which had a similar structure and was under simultaneous development.) First, the retrieval occasionally failed to converge, because no combination of state vector elements and (fixed) forward model parameters could be found that were consistent with both the radar measurements and the a priori data throughout the cloud column. Second, the forward model parameters (number concentration and distribution width parameter) were poorly known and therefore had to be specified with large uncertainty. This reduced both the accuracy of the retrieval (for cases where the true values differed from the assigned fixed values) and the precision of the results (owing to the effect of the large parameter uncertainties propagating into the uncertainties of the results). Finally, calculations of uncertainty in derived products such as ice water content require knowledge of the covariance between variables that were part of the state vector (e.g., characteristic diameter, which was being retrieved) and those that were not (e.g., particle number concentration, which was held at a height-invariant fixed value and not being retrieved), but there was no source for this information. The retrieval process provides a covariance matrix, but this matrix contains only covariances between the elements of the state vector.

[7] In order to overcome these difficulties, an improved algorithm was devised, following the lead of the improved CloudSat liquid cloud retrieval by adding the height-invariant number concentration and distribution width parameter (still using the modified gamma distribution) to the state vector, thereby retrieving values of these parameters in accordance with the measurements, a priori data, and the uncertainties in each. While this change causes the seemingly awkward effect of making the state vector longer than the measurements vector (i.e., retrieving *p* + 2 quantities from *p* or *p* + 1 measurements), a state space diagram shows how a priori data provide the additional constraints necessary to achieve a unique solution (Figure 1). In Figure 1, representing an RO retrieval for a single-bin cloud, the three axes correspond to the three parameters of the particle size distribution. The curved surface grid represents the locus of points (derived from the size distribution parameters) that fit the radar measurement exactly. Note that this surface is not orthogonal to any of the three coordinate directions, so no PSD parameter is solely determined by the radar measurement, but it is fair to say that the exact measurement surface is most orthogonal to the characteristic diameter *D*_{n}. Thus one might say that the characteristic diameter is mostly determined by the radar measurement and the number concentration and width parameter are mostly determined by the a priori data, although both data sources contribute to all three parameters. The a priori data point is shown in Figure 1: the retrieval process selects a point in state space that is consistent with the measurements and the a priori data and the uncertainties in each. This illustrates the major improvement in the previous algorithm: the number concentration *N*_{T} and width parameter *ν* coordinates of the solution point are free to vary in the neighborhood of the a priori point (depending on relative uncertainties), rather than being fixed to prescribed values.

[8] The above improved algorithm served as the ice water content retrieval algorithm in version 5.0 of the CloudSat 2B-CWC-RO standard data product. This version first appeared in Release 3 (R03) of this product, which was designated a “beta” version and released to the science team and community on a limited basis. During the period from January to October 2007, a number of tests were performed on this product. One test whose results were particularly amenable to evaluation of the retrieval as a function of a number of variables was an ice cloud retrieval intercomparison study described by *Heymsfield et al.* [2008]. Evaluations using results from this intercomparison are discussed in section 4.

#### 2.2. Height and Temperature Dependence

[9] The algorithm in the previous section was shown to perform well overall, but it suffered from bias when retrieving the particles at the coldest temperatures and lowest reflectivities and had a greater RMS error than would be preferred. It was believed that the use of the height-invariant number concentration and distribution width parameter was a factor in both of these shortcomings. If these parameters were allowed to vary with altitude (which seemed more realistic) and if a temperature dependence could be incorporated into the retrieval in some way, it seemed likely that the performance should improve.

[10] The key change in formulation in the current algorithm is the change to height-varying retrievals of all three size distribution parameters. This is accomplished by expanding the state vector to include complete profiles of each of the parameters, allowing other height-dependent information to be utilized.

[11] By changing the retrieval framework such that profiles of all three distribution parameters are retrieved, it became possible to include temperature dependencies by making the a priori data values dependent on temperature. Since temperature is one of the most important factors influencing cloud particle evolution, knowledge of the temperature (whether by measurement or modeling) should help narrow the part of the state space selected for consideration in the retrieval process. The most direct way to incorporate this information is to make the a priori data dependent on temperature. The a priori data represent our best knowledge of the state vector before the (radar) measurement is made. If we know the temperature of a particular cloudy region, then we have a better idea of the cloud properties than we would without this information.

[12] In order to determine useful temperature-dependent values of the size distribution parameters, a number of in situ 5-s average ice particle size spectra were examined, including 5796 measurements from four flights during the ARM 2000 Cloud IOP, 2727 measurements from one flight during the AIRS experiment (Canada), and 3709 measurements during ten flights during CRYSTAL-FACE. The spectra were specified in terms of the particle maximum dimension. A mass-dimension relation [*Heymsfield et al.*, 2007] was used to convert the spectra to corresponding distributions of equivalent-mass spheres for use by the retrieval. (Direct measurements of IWC were used to constrain the mass-dimension relationships of *Heymsfield et al.* [2007].) Each converted particle spectrum was then fit by a three-parameter distribution function. The fits were weighted to place more emphasis on the points with *D* > 100 *μ*m. Initially, a modified gamma function of the same type used in the prior retrievals was used for the fit, but this selection proved problematic, because the fit often resulted in a function that grew without bound as the size approached zero, making integrals over the size distribution diverge. Rather than enforcing a minimum particle size to the distribution and dealing with more complex mathematical expressions (due to the truncated distribution) and the issue of what the minimum size should be, it was decided to change to a lognormal size distribution (defined below in (4)). This choice had the advantage of better behavior in the small-particle limit (while retaining analytic expressions for moments of the distribution) and having the same form as the size distribution used in the corresponding liquid water retrieval, which may be of benefit in future crossover development. The choice of a lognormal form is also consistent with *McFarquhar and Heymsfield* [1997], who described a bimodal size distribution with a lognormal function representing the larger particles. (It should be noted that, while the semiinfinite lognormal distribution used in the following analysis extends to sizes both larger and smaller than the actual particle size distribution, the errors incurred by using the nontruncated spectra are inconsequential because the integrands are dominated by the 25- to 2000-*μ*m size range.)

[13] Scatterplots of the three lognormal distribution parameters as a function of temperature are shown in Figure 2, together with linear least-squares fits. The resulting expressions for the fits are as follows:

with standard deviations in log *N*_{T}, *ω*, and log *D*_{g} of 0.555, 0.235, and 0.226, respectively. (The notation “log” indicates the common logarithm; “ln” is used for the natural logarithm.) These expressions were obtained from the synoptic probe data: a separate fit process was performed on convective data, but the convective fits were not used in this version of the retrieval. The current retrieval algorithm is described in detail in sections 2.3 and 2.4.

#### 2.3. Forward Model (Current Algorithm)

[14] The forward model developed for the retrieval assumes a lognormal size distribution of ice crystals,

where *N*_{T} is the ice particle number concentration, *D* is the diameter of an equivalent mass ice sphere, *D*_{g} is the geometric mean diameter, ln indicates the natural logarithm, and *ω* is the width parameter. The distribution in (4) is fully specified by three parameters: *N*_{T}, *D*_{g}, and *ω*. The ice water content (IWC) and the effective radius *r*_{e} are defined in terms of moments of the size distribution,

where *ρ*_{i} is the density of ice (fixed at 917 kg m^{−3} in this version).

[15] For thin ice clouds, the cloud ice particles are sufficiently small to be modeled as Rayleigh scatterers at the CloudSat radar wavelength (3.2 mm) and sufficiently large that their extinction efficiency approaches 2 for visible wavelengths. These assumptions yield the following definitions of radar reflectivity factor *Z*_{Ray} and visible extinction coefficient *σ*_{ext}:

Using (4) for the size distribution in (5) through (8) gives the following equations for the various cloud properties using the units denoted in bracketed subscripts,

All of these properties are functions of position in the cloud column; we can therefore write IWC(*z*), *r*_{e}(*z*), *σ*_{ext}(*z*), and *Z*_{Ray}(*z*). Equations (9) through (12) express the intrinsic properties of the cloud as functions of the parameters of the assumed particle size distribution. The parameters IWC and *r*_{e} are the quantities we seek to retrieve, and values of *Z* are related to our measurements. We may also specify the ice water path (IWP),

[16] Because radar reflectivity in the Rayleigh regime is a function of the sixth power of the particle diameter, a significant error (overestimate in reflectivity) may be introduced by use of the Rayleigh approximation (7) on the large crystals that violate the Rayleigh criterion (even if these coarser particles are few in number). To compensate for this error, a procedure similar to that used by *Benedetti et al.* [2003] was used to obtain a correction function to account for non-Rayleigh scattering of large particles. The ratio of Mie to Rayleigh scattering was calculated for a range of distribution parameters and then fit with approximation functions to preserve differentiability. In order to better reflect the possible variety of distribution shapes, the fit for this algorithm was done in terms of two distribution parameters,

where

and the coefficients *a*_{01}, *a*_{02}, *a*_{03}, *a*_{11}, *a*_{12}, *a*_{21}, and *a*_{22} have values of 0.99, −0.965, 0.25, 0.9688, 0.02, 0.0625, and 0.000001, respectively. As before, this ratio is unity for small particle sizes and decreases with larger particles. (*Benedetti et al.* [2003] investigated the error introduced by the assumption of spherical particles and their similar *f*_{Mie} for various crystal habits as a function of effective diameter and found differences on the order of 10%.)

[17] The radar reflectivity factor *Z*_{Ray} in (11) is defined with respect to ice, but remote sensing radars (including the CPR) conventionally measure the equivalent radar reflectivity factor *Z*_{e} (defined with respect to liquid water) [*Smith*, 1984],

where the dielectric factor *K* is defined in terms of the index of refraction *m*,

Calculating these values for liquid water and ice at the CloudSat frequency results in the following value for (where the ice and liquid temperatures are assumed to be −20°C and 7°C, respectively):

Treating density as constant and including the conversion to equivalent radar reflectivity factor, the new forward model can be written as

#### 2.4. Retrieval Algorithm

[18] The retrieval uses an approach described by *Rodgers* [1976, 1990], *Marks and Rodgers* [1993], and *Rodgers* [2000], where a vector of measured quantities **y** (here, reflectivities in cloudy radar resolution bins) is related to a state vector of unknowns **x** (size distribution parameters) by the forward model **F**,

where ε_{y} represents measurement errors. *Rodgers* [1976] described an optimal-estimation technique in which a priori profiles are used as virtual measurements, serving as a constraint on the retrieval. An a priori profile **x**_{a} is specified on the basis of likely or statistical values of the state vector elements, together with an a priori covariance matrix **S**_{a} representing the variability or uncertainty of this profile and any known correlations among the profile values.

[19] The retrieval algorithm obtains the optimal solution by maximizing the a posteriori probability *P*(**x**∣**y**), where, from Bayes' theorem,

Assuming that the elements of **x** and **y** have joint Gaussian probability distribution functions, this is equivalent to minimizing a cost function Φ that represents a weighted sum of the state vector–a priori difference (from *P*(**x**)) and the measurement vector-forward model difference (from *P*(**y**∣**x**)),

The solution is obtained by iteration using successive estimates of the **x** vector and the **K** matrix (**K** = ∂**F**/∂**x**). These quantities are also used to provide information on convergence, the quality of the solution, and the amounts and sources of retrieval uncertainty. The iterative solution takes the following form:

where the superscripts *i* and *i* + 1 indicate the iteration number and indicates an estimate or retrieved value of **x**.

##### 2.4.1. State and Measurement Vectors

[20] The state vector **x** is the vector of unknown cloud parameters to be retrieved. For a cloud reflectivity profile consisting of *p* cloudy bins, the state vector will have *n* = 3*p* elements,

where *D*_{g}(*z*_{i}), *N*_{T}(*z*_{i}), and *ω*(*z*_{i}) are the geometric mean diameter, number concentration, and distribution width parameter for height *z*_{1} (where *z*_{i} refers to the bin at cloud base). The units of *D*_{g} are millimeters and the units of *N*_{T} are m^{−3}; *ω* is dimensionless.

[21] A state space diagram for the current algorithm is shown in Figure 3. Again, the state space diagram gives a useful indication of the influence of a priori data versus measurements. As the measured radar reflectivity changes, the gridded surface expands or contracts. Thus retrieval coordinates orthogonal to this surface are determined by the measurement, while coordinates parallel to the surface are determined by the a priori values.

##### 2.4.2. Forward Model

[24] The forward model **F**(**x**) relates the state vector **x** to the measurement vector **y**. **F** therefore has the same dimension as **y**,

where the individual elements are given by the following expression:

The subscript FM is a reminder that these quantities are calculated from elements of **x** according to the forward model equation (29), as opposed to the elements of the **y** vector, which are measured quantities.

##### 2.4.3. A Priori Data and Covariance

[25] A priori data for the retrieval help prevent outliers in the solution and constrain the solution where the measurements cannot. The a priori vector **x**_{a} is specified as follows:

We also specify an a priori error covariance matrix **S**_{a},

where all the off-diagonal elements are set to zero. In all likelihood, there probably are correlations between the elements of the cloud column that should be represented in (31), but such information is less readily available than mean and variance statistics and was therefore not included, consistent with the *Rodgers* [2000] description of a maximum entropy retrieval in which the solution is constrained as little as possible consistent with the available parameters and avoiding issues of reliability and sampling of covariance data. The impact of omitting off-diagonal values from **S**_{a} was examined in a synthetic data case for a similar liquid water retrieval and was found to be minimal, but there are a number of ways that covariance information might be utilized, so this is a topic for future research.

[26] Adjustment of the a priori parameters **x**_{a} and uncertainties **S**_{a} allows customization of the retrieval for different cloud types, generation regimes, and geographic areas (tropical, midlatitude, etc.). Initial tests of the current improved algorithm used the temperature-dependent expressions (1), (2), and (3) for the a priori values of the size distribution parameters. While this scheme worked in a large fraction of test cases, the retrieval was occasionally unable to converge. Closer inspection showed that the measured radar reflectivity in these cases differed markedly from the value associated with the a priori size distribution parameters characteristic to that temperature. For example, the a priori parameters corresponding to the temperature in a particular bin might imply a reflectivity of +15 dBZ, while the radar measurement was −25 dBZ, a difference of 40 dBZ. (The reflectivities corresponding to the individual samples of the microphysical database are shown in Figure 4, together with a curve representing the reflectivity corresponding to the distributions specified by the temperature-dependent parameter expressions (1), (2), and (3). The range of *Z*_{e} values covered by a variation of ±1 standard deviation in each of the PSD parameters is also shown.)

[27] Obviously, the microphysical state of a cloud is dependent on factors other than temperature, such as the availability of moisture and ice nuclei and convective intensity. Because knowledge of the temperature alone cannot account for this wide variation in *Z*_{e}, a procedure was adopted whereby the temperature-dependent expressions (2) and (3) were used to determine the a priori values of *D*_{g} and *ω*, but the a priori number concentration *N*_{Ta} was determined by a method taking advantage of a *Z*_{e}-IWC relationship from *Liu and Illingworth* [2000]. (A similar technique was used in the previous retrieval.) Specifically, equations (9), (11), and (21) were combined and solved for *N*_{T},

The *Liu and Illingworth* [2000] expression for IWC was then used to write the IWC term in terms of *Z*_{e}. A value of *N*_{T} was then obtained for each cloudy bin using the prescribed *D*_{ga} and *ω*_{a} values, and these *N*_{T} values were then averaged to obtain an *N*_{Ta} value for the column. In this way, both temperature and reflectivity information contribute to the selection of a priori values.

[28] A priori uncertainties (diagonal terms in (31)) were set to the standard deviations of the fits of the three distribution parameters given in (1), (2), and (3) and shown in Figure 2. The uncertainty of the width parameter was reduced by 50% to reduce the bias of the retrieval based on performance tests and following the example of *Benedetti et al.* [2003], which had a tightly constrained width parameter.

##### 2.4.4. Convergence and Uncertainties

[30] Random error components in the derived quantities *r*_{e}, IWC, and IWP are given by the customary expressions of error propagation,

Analytical expressions for the partial derivatives are easily derivable from (9), (10), and (13). The variance and covariance terms are all available in the solution matrix **S**_{x}.

[31] The above expressions describe the random components of uncertainty in the retrieval output given the stated uncertainties in the inputs and the assumptions used in constructing the retrieval. Biases not accounted for in the retrieval formulation (systematic uncertainties) are discussed in section 5.