Bayesian algorithms are a common method of retrieving cloud properties from a set of observed passive microwave brightness temperature (TB) measurements. In practice, such methods often use predefined databases from cloud-resolving models to perform the retrieval. Successful performance in these types of Bayesian retrievals is greatly affected by the similarity of TBs between observed and predefined databases. Here empirical orthogonal function (EOF) analysis methods were used to illustrate the importance of predefined simulation databases on the ability to retrieve known regional variability in cloud structures across different parts of the tropical oceans and to provide insight on the relative impact of environmental conditions to limitations in cloud model microphysical parameterizations in retrieving different types of cloud structures. The spatial distributions of EOF coefficients in EOF space showed that there were both underrepresentation and overrepresentation between manifolds of the predefined and observed databases. Moreover, there was poor alignment in the frequency distribution between observed and simulated clouds structures. Additionally, the ability to retrieve known regional variability in convective cloud structure was dependent on the environmental conditions used to generate the simulated data set more so than limitations in the model microphysical parameterizations. The opposite was true for stratiform rain, as the dominant EOF patterns of the radiation indices for the predefined databases exhibit overproduction of large-sized high-density ice-phase hydrometeors for a given rain content in comparison to the observations. Improvements to microphysical parameterization schemes appear to be necessary to produce radiance index vectors that are consistent with observations.
 Satellite passive microwave observations have been used to retrieve tropical rainfall for the past two decades. One approach in performing the retrieval is to apply the Bayesian theorem [Evans and Stephens, 1993; Kummerow et al., 1996; Olson et al., 1996; Evans et al., 2002; Seo and Liu, 2005]. The Bayesian algorithm inverts brightness temperatures (TBs) into rain rates by using a predefined database consisting of the relations between cloud/rain structures and microwave radiances. The database is often constructed from the simulations of precipitation systems [Tao and Simpson, 1993; Mugnai et al., 1993]. Hence, the question of how well simulations represent natural clouds is a key to successful performance of retrieval algorithms. This concern is even more important if the number of simulations used to construct the retrieval database is limited or if the structure of natural clouds varies appreciably across the retrieval domain.
 Latitudinal variability in the dominant type of cloud and in the structure of precipitating storms from the tropics to middle latitudes and even the polar regions are well known [Houze, 1981]. But significant variability is even evident across a narrow band of latitude, such as across the tropics. Petty  found a strong regional dependence of the frequency of thunderstorms in association with the variability of tropical cloud systems. Large discrepancies in rainfall retrievals based on satellite infrared and passive microwave were found between eastern and western Pacific rainfall systems [Berg et al., 2002]. Precipitation systems in the eastern Pacific are shallower, especially for small storm systems, and have a higher percentage of stratiform rain. In the western Pacific, storm systems contain more ice. Petersen and Rutledge  found that convection sampled over the western Pacific warm pool was slightly more intense, in terms of radar reflectivity structure, than those sampled over the other tropical oceanic regions.
 Given that regional and temporal variabilities exist in the radiative characteristics of tropical cloud systems, retrieving cloud properties from passive microwave observations likely requires a large set of predefined databases from a wide variety of numerical simulations. While the cloud-radiative model database supporting version six of the Tropical Rainfall Measuring Mission's [TRMM; Simpson et al., 1988, 1996] passive microwave retrieval algorithm (TMI 2A12) consists of several simulations [Olson et al., 2006], more simulations may be required. Furthermore, it is known that small differences (a few degrees) between observed and presimulated TBs can result in significantly different retrieved cloud structure [Seo and Biggerstaff, 2006]. Moreover, a necessary condition for retrieving cloud properties from passive microwave observations is that the manifolds (multidimensional relations) of TBs be similar between observed clouds and predefined databases [Panegrossi et al., 1998; Seo and Liu, 2005; Seo et al., 2007a].
 The number of dimensions in TB relations can be quite large. For example, the TRMM microwave imager [TMI; Kummerow et al., 2000, 2001] operates at five frequencies (10.65, 19.35, 21.3, 37.0, and 85.5 GHz) with four of the frequencies having both horizontal and vertical polarizations, yielding nine possible dimensions. Assessing the degree to which the TB manifolds are consistent between observed and simulated cloud systems, therefore, requires an approach that reduces the number of dimensions while simultaneously preserving the correlations between TB relations.
 In this study, we demonstrate a method based on empirical orthogonal function (EOF) analysis that may be used to visualize and quantify the similarity between manifolds of TB relations for observed and simulated storms. We use this method to diagnose the regional variability in observed TB relations across specific regions in the tropics and to provide insight on the relative impact of environmental conditions to limitations in cloud model microphysical parameterizations in retrieving different types of cloud structures. While the simulations herein are somewhat limited, they can be used to illustrate potential sources of uncertainty in Bayesian-based passive microwave retrievals that use predefined cloud-radiative model databases. The results suggest that the retrieval uncertainty may be strongly dependent on cloud type and illustrates the need to use model databases constructed with environment conditions appropriate for the region in which the retrieval is being performed.
 To demonstrate the technique for evaluating similarity between observed and simulated storm structure and to understand how numerical databases may affect regional variability in retrieved cloud properties, both observed and simulated TBs are needed.
2.1. Observational Data
 TMI TBs (TRMM product 1B11) were collected at 10.65, 19.35, 37.0, and 85.5 GHz with both horizontal and vertical polarizations over four different regions across the tropical oceans (Figure 1). The regions chosen are from the tropical northwest Pacific (NWP), the tropical southwest Pacific (SWP), the tropical east Pacific (EP), and the tropical Atlantic (TA). These four regions represent a broad range of cloud regimes and are expected to exhibit regional dependences in cloud structure due to differences in thermodynamic environments and large-scale forcing [e.g., Berg et al., 2002; Berg and Kummerow, 2006]. Except for the 85.5 GHz frequency that is sampled every 4.5 km, the TMI samples the atmosphere every 9 km along the scanning direction.
 But the actual fields of view (FOVs) of an individual measurement vary with frequency [Kummerow et al., 1998]. Hence, for most frequencies, the 9 km pixels of data represent oversampling of the atmosphere. Data from one year, December 1999 to November 2000, were used to provide a robust set of observations. As this study is focused on regional dependency, the data were not partitioned by season. Moreover, the regional variability across the four regions is likely greater than the seasonal variability within any one region for a specific year. Pixels without rain were removed from further consideration by using 37.0 GHz polarization difference [Kummerow and Giglio, 1994; Hong et al., 1999]. If the 37.0 GHz polarization difference is greater than 40 K, the pixel is assumed to have no rain.
2.2. Simulation Experimental Design
 To provide enough variability in simulated storm structure to illustrate the impact of model database on the retrieval process and to cover at least a portion of the broad range of convective storm systems observed in the tropics that are important for rainfall retrievals over the regions selected, three different simulations of tropical oceanic convective storm systems were performed (Table 1). The cases chosen were particularly well sampled by radar and high-resolution passive microwave radiometers during various field experiments conducted in support of TRMM. Data from these observations were used to help guide the numerical simulations.
Table 1. Specific Features of the Three Observational Cases Simulated for This Study
Name of Case
An asymmetric leading-line trailing-stratiform squall line observed on 29 October 1999
Loosely organized multicellular system with little stratiform rain observed on 19 September 1999
Symmetric leading line trailing-stratiform squall line with broad stratiform rain observed on 23 February 1993
 The 29 October 1999 case observed during the Kwajelein (KWAJ) Experiment represents moderately organized Tropical Cloud Clusters [TCC; Houze and Hobbs, 1982; Houze and Churchill, 1987] that contain both deep convection and small regions of stratiform rain, produced in part by dissipating convection [Biggerstaff and Listemaa, 2000]. The 23 February 1993 case observed during the Tropical Ocean Global Atmosphere (TOGA) Coupled Ocean-Atmosphere Response Experiment (COARE) represents well-organized leading-line trailing-stratiform squall line systems that produce a broad region of stratiform rain from advection of hydrometeors out of the tops of convective cells followed by growth through vapor deposition in a mesoscale updraft. This particular squall line was documented by Jorgensen et al. .
 Both the KWAJ and TOGA simulations were taken from environments that have similar thermodynamic characteristics to the regions in which the retrievals are performed (Figures 2 and 3). In contrast, the third simulation was chosen to evaluate the impact of using a database with significantly different thermodynamic environment. The Keys Area Microphysics Project [KAMP; Biggerstaff et al., 2005] storm observed on 19 September 1999 represents unorganized deep convective clusters that produce little stratiform rain.
 While the simulations conducted here are more limited than the official TRMM database, they serve to demonstrate the analysis method used to quantify the similarity between observed and simulations manifolds of TBs. Moreover, the selected simulations provide an efficient data set to capture first-order concerns in constructing databases for passive microwave retrievals of cloud properties. We acknowledge that the model does not simulate shallow, warm-rain clouds that may be important to the climatology of rainfall in the east Pacific. However, the model does produce a mixture of moderate to strong convective cells that represent the range of observed deep convection.
2.3. Simulation Method and Generation of Radiance Indices
 The convective systems were simulated using the Collaborative Model for Multiscale Atmospheric Simulation [COMMAS; Skamarock and Klemp, 1993; Wicker and Wilhelmson, 1995], which is a nonhydrostatic, adaptive-grid cloud-resolving model. All simulations reported here used the same microphysical scheme, which is based on Tao and Simpson  that itself is a hybrid version of Lin et al. . The scheme uses a two-class liquid water (cloud water and rain) and three-class ice-phase (cloud ice, snow, and graupel) parameterization. For rain, snow, and graupel, the intercept values of an exponential [Marshall and Palmer, 1948] drop size distributions were 22 × 106, 3 × 106, and 10 × 106 m−4, respectively. The densities of snow and graupel were 100 and 600 kg m−3, respectively. A more complete description of the microphysics and initialization procedures can be found in Hristova-Veleva  and Biggerstaff et al. .
 All model-generated variables including microphysical mass contents in a unit volume (densities) were archived every 5 minutes during developing and mature stages for each convective system. The model output was recorded at 3 × 3 km2 horizontal resolution and 500 m vertical resolution with higher resolution below 0.95 km. All hydrometeors were used as inputs for a radiative transfer model based on a one-dimensional second-order Eddington approximation [Weinman and Davies, 1978; Kummerow, 1993] to calculate microwave TBs at the TMI frequencies. The TBs were computed for the same viewing angle of the TMI radiometer. Then, the calculated TBs were convolved using a frequency-dependent antenna gain function over their effective fields of views (EFOVs). Details of the EFOVs are found in Kummerow et al. . Nominal (12 km × 12 km) TB data pixels were constructed from the convolved TBs. Nominal pixels with no-rain were eliminated in the same manner as in the observational data.
 In general, TBs at TMI frequencies are most affected by upwelling radiation in the path of the radiometer's viewing angle. Hence, TBs typically reflect physical properties of the underlying clouds. As a result, a multivariate analysis showing correlation among TBs provides information on cloud properties [Biggerstaff et al., 2006]. Nevertheless, variability in atmospheric water vapor and sea-surface emissivity may affect brightness temperatures. To avoid radiative extinction (attenuation) due to background variability irrelevant to precipitation signal, the attenuation and scattering indices [Petty, 1994] at each TMI frequency are used in place of the nine TMI TBs. These radiance indices are also used by the Goddard Profiling Algorithm (GPROF) [Kummerow et al., 2001; Olson et al., 2006].
 The attenuation index (P) is a normalized polarization difference, ranging from 0 and 1, with small value of P denoting opaque liquid cloud and values near 1 indicating a cloud-free field of view. P is insensitive to scattering by ice particles and depends primarily on emission of radiation by liquid hydrometeors. In contrast, the scattering index, S, represents volume scattering associated with frozen precipitation aloft. Large value of S corresponds to strong ice scattering.
 To aid intuitive interpretation of the radiance indices in terms of the behavior of TBs, the radiance indices of Petty  are modified here. We define a modified attenuation index, Pm and a modified scattering index, Sm, such that
The modified attenuation and scattering indices have the same order of magnitude and behavior as TBs [Seo et al., 2007a]. That is, greater values of Pm correspond to larger liquid hydrometeor volume and higher emission just like greater values of the lower frequency TMI TBs. Similarly, larger negative values of Sm correspond to high volumes of high-density ice and greater scattering signatures just like the higher frequency TMI TBs. For this study, we make use of the modified attenuation indices at 10.65, 19.35, 37.0, and 85.5 GHz and the modified scattering index at 85.5 GHz. We will refer to this set of modified radiance indices with the notation I. Hence, a column vector, I, containing four attenuation indices and a scattering index, is defined as
where subscripts in the components of I denote the frequency of TMI channels. The modeled rain rates and their corresponding sets of modified radiance indices illustrate one method to constitute a predefined database of an admittedly limited cloud-resolving model based Bayesian retrieval algorithm.
2.4. Classification of Rain Pixels
 Since the same Is can be associated with significantly different rain rates depending on whether the cloud is convective or stratiform [Seo and Biggerstaff, 2006], we partition both the observed and simulated raining pixels into three categories based on the convective fraction that is contained within the nominal (12 km × 12 km) pixel.
 For the observed Is, the echo classification is based on the fraction of the nominal 12 km × 12 km area that is covered by convective rain. To determine the convective fraction, convF, the echo classification assigned by the TRMM Precipitation Radar (PR) over the nominal area is used. The PR resolution is approximately 4 km × 4 km yielding about nine samples over the nominal area.
 For the simulated Is, each column in the model output is assigned a convective, stratiform, or no-rain classification based on the surface rain rate, integrated cloud water content, and vertical motion following the technique used by Tao and Simpson . The total number of convective columns in the nominal 12 km × 12 km area divided by the total number of grid columns over the area yields convF.
 Based on the convF, all rain pixels in the observed and simulated raining pixels were classified into three categories as follows:
 If the nominal fields of view were completely covered by precipitation, category 1 would contain mostly convective rain, category 2 would contain a mixture of convective and stratiform rain, and category 3 would contain mostly stratiform rain. However, it is possible that category 2 and category 3 events consist of just convective rain mixed with varying degrees of echo-free regions. Given that the PR has rather low sensitivity (∼0.7 mm/h), it is difficult to know if there is weak stratiform rain within a given PR nonconvective pixel. It would have been possible to treat such areas as “rain-free” and use only the heavier PR-measureable stratiform rain in category 2 and category 3. But doing so would have introduced a bias in the analysis. Using a month of TRMM observations over NWP as a guide and considering just category 3 rainy TMI pixels, each TMI rainy pixel had some PR measureable stratiform precipitation, and only 17% had PR convective classifications within the field of view. Hence, the majority of category 3 pixels for the observed data set are most likely stratiform precipitation rather than isolated convection mixed with rain-free regions.
2.5. Distribution of Echo Classification
 Applying the cloud classification scheme to the observed rainy satellite FOVs yielded a distribution of about 12% convective, 14% mixed and 74% nonconvective with relatively little variability across the oceanic regions (Table 2). Although overall the convective clouds in all regions occupy a small portion, their contribution to total rain is significant because the mean rain rate is at about six times greater than that of stratiform rain [Fu and Liu, 2003].
Table 2. Percentage (With Number of Occurrences in Parentheses) of Each Category in All Rain Pixels
 In contrast to the observations, the simulated storms had greater variability in the distribution of raining cloud classification (Table 3). In particular, the KWAJ simulation had only 6.5% of the rainy pixels classified as convective, which is not surprising given that the KWAJ cloud system was relatively small with convection along the periphery of a broader stratiform region. Excluding KWAJ, there is a bias toward greater percentage of convective rain pixels and underrepresentation of nonconvective pixels in the simulations as compared to the observational database.
Table 3. Percentage of Each Category in All Rain Pixels of the Model-Generated Databases
2.6. Comparison of Simulated Storm System Structure
 Contoured frequency by altitude diagrams (CFADs) of radar reflectivity for the three simulations show the differences in storm structure for each echo classification (Figure 4). In terms of convection, the KWAJ simulation produced the least variability in cloud structure while KAMP exhibited a wide-ranging distribution of convective cloud structures. Consistent with the higher values of CAPE in the KAMP environment, the KAMP simulation also had the strongest convection. But there are also more convective cells with relatively low reflectivity throughout the troposphere as compared to the other simulations.
 In the nonconvective (category 3) echo classification, KWAJ exhibited a very broad distribution of cloud structures while TOGA had a much more narrow distribution. The narrower distribution in TOGA is due to the development of a large well-defined stratiform region in that simulation as compared to the other simulations.
 While the observational category 3 classification is likely mostly stratiform rain, the CFADs for the KAMP and KWAJ simulations suggest that category 3 classification in these databases may be biased by convection mixed with rain-free regions. This is particularly true for KAMP, as the category 3 CFAD does not exhibit a secondary maximum in frequency occurrence for weak radar reflectivity between 5 and 10 km altitude. In other words, compared with the other simulations, the distribution of KAMP category 3 classification profiles are biased toward strong reflectivity at mid to upper levels.
2.7. EOF Analysis Methods
 The modified attenuation index vector defined above reduces the number of dimensions in the TB relation manifold space from nine to five. TBs measured at multiple channels are often strongly correlated since they pass through the same cloud for a given condition of surface and ambient atmospheric environment [Seo and Liu, 2005; Biggerstaff et al., 2006; Seo and Biggerstaff, 2006]. The degree of cross correlation is associated with the vertical distribution of hydrometeors within a vertical cloud column and with the closeness of frequencies of the radiometer channels. Since Is are simple functions of the TBs, it should be possible to further reduce the number of dimensions using EOF analysis, as in Seo and Liu  and Seo et al. [2007b], to allow for both qualitative and quantitative evaluation of how well model databases represent observational variability. Hence, EOF analysis has been used here to examine correlations between the five radiance indices (Pm10, Pm19, Pm37, Pm85, and Sm85), where subscripts denote the frequency of TMI channels) and to determine the correlated structures of variability in the indices. This analysis was conducted separately for each category of rain classification in both the observational and simulated data sets.
 To perform the EOF analysis, we let Ii represent the ith column vector, which contains the five radiance indices. The spatial area average of indices for each category of each oceanic region is given by
where M denotes the number of data points collected over each region during the period. The anomaly at a given location, Ii, is given by
These anomalies are then expressed with respect to EOFs of each observational or simulation data set such that
3. Multivariate Relations of Attenuation and Scattering Indices
 Upwelling radiation in the path of radiometer's viewing angle is mostly affected by hydrometeors. Hence, the TBs or radiance indices at the TMI channels reflect physical properties of the underlying clouds. As a result, a multivariate analysis showing correlation among the radiance indices provides information on the cloud properties [Biggerstaff et al., 2006]. Hence, EOF analysis also reveals the correlation structures of variability at the TMI channels. Based on the EOF analysis, this section examines multivariate relations (that is, EOFs) of the observational and the simulated radiance indices for each category of cloud classification.
 It should be understood that physical interpretations of the EOF analysis are best provided in a general sense. A mature convective cell has much different hydrometeor structure than a stratiform cloud and that difference is easily discerned in the EOF framework. Likewise, convective cloud columns that contain an abundance of rain or just precipitation-sized ice have distinguishable characteristics. While it is possible that more than one physical interpretation may explain the same EOF structure, actual profiles of cloud properties from the simulations are provided in the Appendix to demonstrate that the interpretations provided here are reasonable.
3.1. Observed Radiance Indices
 The multivariate relations in the observed radiance indices are examined for each region. The regional separation of observations is expected to help understand the regional variability in the tropical rain clouds and serve as a guideline for the selection of predefined simulations.
3.1.1. Convective Rain Classification
 Eigenvalues of the first EOFs indicate that the first EOFs explain about 77% of the respective total variances across the ocean domains (Figure 5). The first EOF structures of the radiance indices over each ocean exhibit very similar correlation patterns with a weak-to-moderate warming (positive value) for the attenuation indices (Pm10, Pm19, Pm37, and Pm85) strongly coupled with a moderate-to-strong cooling (negative value) for the scattering index (Sm85) (Figure 5a). This pattern is very similar to that found over the TRMM-covering area, implying a tendency of the increase of liquid hydrometeors jointly with the increase of ice hydrometeors aloft or vice versa [Seo et al., 2007b]. This structure is consistent with deep convective clouds that contain rain underlying high-density ice-phase precipitation; the rain layers add up radiation to warm TB (that is, to increase the attenuation indices), while the ice layers scatter upwelling radiation from the rain layers to lower TB (that is, to decrease the scattering index). The EOF analysis shows that the majority of convective clouds over the oceans generate TB variability patterns associated with deep, mature-phase convection.
 The second EOF patterns explain about 12%–14% of the variance in TB relations for category 1 cloud systems. The second EOF pattern displays quite different features from the first EOF pattern, having a simultaneous warming (positive amplitude) or cooling (negative amplitude) for all radiance indices (Figure 5a). The radiation index correlation patterns are likely associated with relatively shallow or developing convective clouds having mostly liquid (for positive amplitudes) or tall dissipating convective clouds having mostly ice (for negative amplitudes).
 Although the second EOFs of the four regions exhibit similar warming/cooling patterns to each other, they have relatively large variability between them compared to the first EOFs. The correlation patterns over NWP and SWP are different from those over EP and TA. In particular, the former (latter) oceans exhibit relatively small (large) amplitude for the attenuation index of 85 GHz. The small amplitude over NWP and SWP might be attributed to TB saturation at 85 GHz due to warming by more abundant liquid hydrometeors, compared to the large amplitude over EP and TA. Given the greater variability in the second EOF structures compared to the first EOF structure, it appears that there is larger regional variability in shallow rain and/or tall ice clouds than in deep convective clouds over the four oceanic regions. Thus, the four oceans exhibit somewhat different variability patterns in the observed TBs. The comparison among the regions infers that convective clouds seen by the TMI radiometer have regional variability in vertical cloud structure, which could include both cloud top height and microphysical constituents.
 To investigate the two different variability patterns, cloud top heights of all convective rain pixels of 2A23 products from TRMM PR were analyzed. Overall, in the selected oceanic regions (Figure 6) about 70% of the convective cloud top heights are below ∼6 km. The distribution of PR-estimated convective cloud type is consistent with the trimodal distribution of tropical convective clouds noted in Johnson et al. .
 The dominance of shallow convection in the PR distribution and the dominance of deep mature convection in the TMI EOF analysis require explanation. First, the TMI field of view is relatively large compared to the PR and small, isolated shallow precipitating cumulus do not provide sufficient radiometric signal to be readily observed by the TMI. Hence, the TMI observations are likely biased toward deep clouds with high precipitation content. Secondly, the sensitivity of the PR is approximately 17 dBZ [Kummerow et al., 2000] which is insufficient to observe low concentrations of low-density snow and small cloud-ice particles in the upper portions of the cloud. Hence, the true cloud tops are likely underestimated by the PR. Regardless, the observed regional variability in both the TMI and the PR observations, are not a consequence of the constant sampling biases of the instruments. In particular, the large relative frequency of shallow convective clouds over EP agrees well with the large amplitude of attenuation index at 85 GHz in the second EOF pattern that is consistent with warm rain and little precipitation-sized ice aloft.
3.1.2. Mixed and Nonconvective Classification
 The first EOF patterns of categories 2 and 3 look very similar to those of category 1 with a weak-to-moderate warming at the attenuation indices and a moderate cooling at the scattering index (Figure 5b). The first EOF pattern with the scattering signatures at 85 GHz illustrates that there are mature-to-dissipating convective cells [e.g., Biggerstaff and Houze, 1991, 1993] with rimed hydrometeors embedded within or adjacent to the stratiform rain or a few isolated convective cells over the nominal TMI footprint area (12 × 12 km2). It has been suggested that strong scattering signatures may also be formed by an abundance of low-density snow. According to calculations by Seo and Liu , for a 30 K cooling at 89 GHz, you would need 10 kg m−2 of snow. Assuming a 5 km snow layer, that equates to an average of 2 g m−3 ice water content of snow. Such high-content snow layers, devoid of higher-density ice particles, seems unlikely over the tropical regions considered here. The subtle decrease in the amplitude of the scattering indices between categories 1 and 3 suggests that the convection in category 3 would be associated with less scattering signature for a given rain column.
 The second EOFs are significantly different from those in category 1. There is pronounced reversal in the sign of the correlation between the 85 GHz attenuation index and the other attenuation indices. The distinct minimum in the attenuation indices at 85 GHz and maxima at 37 GHz suggests the rain rates are saturating 85 GHz while providing significant signal at lower frequencies. In addition, the noticeable difference between the 85 GHz attenuation and scattering indices suggests that the clouds contributing to the second EOF structures are devoid of high-density ice despite having enough rain to saturate the 85 GHz attenuation indices. While the second EOFs in category 1 are more consistent with different stages in convective cell life cycle, the second EOFs in categories 2 and 3 are more indicative of stratiform cloud.
 As in convective cloud classification, the EP region appears to be somewhat unique in terms of the second EOF structure in mixed and nonconvective classifications. The lower amplitudes at 10 and 19 GHz attenuation indices suggest that the rain rates of stratiform clouds in regions of mixed convective and stratiform rain might be less over EP than the other oceanic regions. Lower overall rain rates would also tend to produce less saturation at 85 GHz. Physically, the less frequent number of deep convective clouds over EP compared to the other oceanic regions likely results in less-sufficient seeding of stratiform rain regions as the thickness of the seeding and ice concentrations would likely be lower compared to more active regions of deep convection. Rutledge and Houze  illustrated the importance of convective seeding in producing stratiform precipitation in linear multichannel seismic profiles (MCSs).
 The EOF analysis here shows how the multivariate relations in observed TBs can be used to characterize cloud structure, reflecting the regional variability in raining cloud systems. In particular, the TMI measurements are able to distinguish the uniqueness of cloud structures (that is, vertical distribution of hydrometeors) over EP as compared to the other oceanic regions.
3.2. Model-Generated Radiance Indices
 Cross correlations between the TMI channels for the three different simulation data sets are investigated in a similar manner to the observed radiance indices and are compared qualitatively with the observational data. Such an analysis is beneficial for understanding the applicability of the predefined model databases in retrieving cloud structure using TMI TBs over the TRMM satellite-covered oceans and diagnosing potential effects of spatially varying environments in satellite rain retrievals.
3.2.1. Convective Cloud Classification
 While the observed category 1 rain pixels exhibit a strong warming for the attenuation indices and a modest cooling for the scattering index of 85 GHz, the simulated category 1 rain pixels exhibit a weak warming for the attenuation indices and a pronounced cooling for the scattering index (Figure 7a). In terms of the correlation between radiation indices, the simulations agree well with the observations over the four oceans, where the amounts of liquid and large, high-density ice in deep convective clouds are well balanced. In terms of the eigenvalues of the first EOF, the KWAJ and KAMP simulations produce less variability in convective rain structures than found in the observations. The lower eigenvalue for the first EOF in the TOGA simulation and the structure of the second EOF, with a distinct maximum at 19 GHz attenuation and positive correlations between the attenuation and scattering indices suggests a greater number of developing convective clouds that contain considerable liquid water but little high-density ice. Indeed, the CFAD for TOGA category 1 clouds (Figure 4) shows a broad distribution of reflectivity values above the freezing level and a narrow distribution below the freezing level indicating the presence of profiles with moderate rain content and weak reflectivity aloft. Given the leading-line trailing-stratiform structure of the TOGA simulation, the two-dimensional forcing of new convective elements along the leading edge likely produced a greater frequency of developing cells as compared to the KWAJ and KAMP simulations where the convection was less well organized.
 While the structures of the first EOFs are similar in both the observations and the simulations, the more negative magnitude of the scattering index in the first EOF patterns of the observations and simulations reveal that there is a tendency for simulations to overproduce graupel for a given rain content [Biggerstaff et al., 2006; Seo and Biggerstaff, 2006].
 There are subtle differences in the structure of the second EOFs as well. The observed second EOFs lack the distinct maxima in attenuation index at 19 GHz found in the simulations. Additionally, the positive amplitude of the scattering index is significantly stronger in the observations compared to the simulations. As noted previously, the observed second EOF structure is consistent with both shallow convective clouds and deep clouds that contain mostly ice. In contrast, the second EOF structure in the simulations is more consistent with clouds containing higher concentrations of rain water with more 85 GHz scattering than the observed clouds. The simulated signal is consistent with developing deep convection. To the extent that the simulations do not produce shallow, warm-rain convective clouds, there is less variability in the simulated cloud structures than the observed convective cloud type.
 To summarize, for category 1 clouds, the multivariate relations of the simulations representing the three different environments are almost identical to each other. That is, the three simulations are likely to produce quite similar vertical deep convective cloud structures. Discounting the overproduction of graupel and the role of shallow warm rain in the global distribution of latent heating, the multivariate relations of the simulations are similar to those of the observations representing the four different oceans.
3.2.2. Mixed and Nonconvective Cloud Classification
 Among the simulations, the first EOF patterns in categories 2 and 3 have a weak-to-moderate warming for the attenuation indices and a strong cooling for the scattering index of 85 GHz, which are very similar to the first EOF patterns of the observations (Figures 7b and 7c). However, the overproduction of graupel in the simulated storms and its impact on the amplitude of the scattering index are evident.
 The second EOFs in category 2 exhibit less pronounced maxima at 19 GHz and a stronger amplitude of warming in the scattering index along with less saturation at 85 GHz as compared to the second EOFs in category 1. Taken together, this variability pattern is consistent with relatively low rain content clouds with little potential for scattering. Indeed, this variability pattern is indicative of weak stratiform clouds, which appears to be different from the observed category 2 EOFs that exhibit a pronounced saturation effect at 85 GHz. Perhaps the effect of graupel in the simulated stratiform clouds acts to reduce the saturation effect at 85 GHz by slightly reducing the upwelling energy to keep this frequency from reaching saturation as often.
 The only simulation that approaches the variability structure of observed category 3 clouds is the TOGA simulation, which has a pronounced minimum in the 85 GHz attenuation index in the second EOF. The classic leading-line trailing-stratiform squall line system produced by the TOGA run was the only simulation with a broad region of stratiform rain with relatively more columnar liquid water content than either KWAJ or KAMP. The higher columnar liquid water content apparently saturated the higher-frequency attenuation indices. In contrast, the KAMP simulation produced a loosely organized convective cluster with several isolated cells along the northern periphery of the system. Both simulations had about 21% category 1 and about 66% category 3 classification (Table 3). Given the spatial distribution of rainfall in both simulations, category 3 in TOGA represents modeled stratiform rain while category 3 in KAMP represents isolated convection mixed with echo-free regions.
 For category 3 classification, the second EOF structure for KAMP and TOGA are inherently different. KAMP shows that the attenuation indices have the same sign, with increasing magnitude at higher frequencies, which is what one might expect for clear air contributions. The second EOF for TOGA, on the other hand, showed a decreasing contribution with increasing radiometer frequency and a pronounced minimum at 85 GHz, which is consistent with the category 3 observations. Given the differences between the TOGA and KAMP second EOFs in the category 3 clouds, it is likely that the reduction at 85 GHz in observed category 3 clouds is associated with weak “averaged” stratiform rain and not echo-free regions.
4. Comparisons of Observations and Simulations in EOF Space
 In the previous section, differences between the observations and the simulations were discussed in terms of EOF structure (i.e., in terms of the multivariate TB correlation). Here, we examine the distribution of the amplitudes of each EOF by plotting the frequency distribution of the scatterplots of first and second EOF coefficients in EOF space. In other words, the figures are arranged such that the abscissa is the first EOF of the observational database of a given rain type and the ordinate axis is the second EOF of that observational database.
 To assess the differences between the observations and simulations more thoroughly, the simulated perturbation radiance indices were projected onto the EOFs of the observed radiance indices in the same category. By doing so, the simulated TBs can be evaluated within the observational manifold. That is, the perturbation radiance index vector for a given rain pixel of a given simulation data set (I′(s)) is projected onto the ith EOF (ei(o), i = 1, 2) of a given observational data set. Then, the projected amplitude, ai(p), can be described as
where (s) denotes the mean radiance index vector for each simulation data set. Hence, the occurrence frequency of the projected two EOF coefficients (a1(p) and a2(p)) is compared to that of the two EOF coefficients (a1(o) and a2(o)) of a given observational data set. In this manner, the projected two EOF coefficients in the observational EOF space can serve as an efficient measure of the proximity of the predefined simulation databases against the observations in a radiative view, specifically with respect to the variation of cloud structures. It should be noted that the first two observational EOFs explain a significant portion (88%–92%) of the total variability of the observational data.
 The frequency occurrence of a1(o) and a2(o) in the first and second EOF space of the observations are displayed in Figures 8, 9, and 10 using black solid line contours. The dotted contour shows the outer extent of the frequency distribution for the simulated databases assuming that the abscissa were the first EOF of the simulated databases and the ordinate were the second EOF of the simulated database. The shaded contours show the frequency occurrence of a1(p) and a2(p) obtained by projecting the simulation perturbation radiance vector onto the observed EOFs. To aid interpretation, it should be noted that the x axis (y axis) corresponds to the first (second) EOF, respectively, such that the x axis (y axis) represents relatively deep clouds (warm clouds having mostly liquid hydrometeors for positive values in y-axis or cold clouds having mostly ice hydrometeors for negative values in y-axis) in all the data sets. The large number of slightly negative amplitude pairs is a reflection that many of the profiles in the database are similar to, but slightly weaker than, the mean profile. The frequency occurrence spreads broadly towards more intense signatures by the extreme convective events with the database, i.e., the cloud profiles with the greatest hydrometeor contents.
4.1. Qualitative Comparisons
4.1.1. Observed Distributions
 The observational databases over the four tropical oceanic regions exhibit somewhat unique manifolds of radiance indices in EOF space for category 1 clouds (Figure 8). For example, the outer frequency contour for the NWP includes a parameter space of very high first EOF amplitude and large negative second EOF amplitude. This implies deep clouds that have high rain water contents and very large scattering effects at 85 GHz. The presence of significant amounts of graupel needed to produce such extreme examples of scattering is a sign of convective vigor. Hence, for the domains examined here, it appears that NWP has a greater frequency of intense deep convective clouds than the other oceanic regions. This tendency for more intense convection is the primary distinguishing factor between the manifolds of NWP and SWP.
 EP also exhibits a unique frequency distribution of radiance amplitudes in EOF space. The EP distribution is narrower than the other oceanic regions. There is also less dynamic range in positive values of the second EOF and a “notch” in the distribution indicating few occurrences of moderate positive first EOF amplitudes coupled with moderately negative second EOF amplitudes. The narrowness of the distribution implies a more peaked distribution in natural convective cloud structures (most convection similar to the mean profile), while the lack of moderate second EOFs coupled with moderate first EOF amplitudes suggest optically thin clouds for a given moderate convective rain content. Optically thin convective clouds with moderate rain content would be consistent with shallow, warm-rain events. Hence, it appears that for moderate convective rain content that clouds in EP have less overall precipitation-sized ice signatures compared to the other oceanic regions. High convective rain content, nevertheless, would be associated with deep clouds with strong scattering signatures as the high amplitudes of the first EOFs are coupled with strongly negative second EOF amplitudes in EP.
 The uniqueness of the TA distribution is that it fits between NWP/SWP and EP in terms of breadth, has similar frequency of intense events as EP, but does not exhibit the same notch in frequency distribution of EP at moderate values of the first EOF. Hence, TA region appears to contain fewer intense convective clouds than is observed over NWP but is not dominated by shallow, optically thin clouds at moderate convective rain content. Instead, the TA distribution suggests the region is populated by deep moderately intense convection with moderately high rain contents and moderately strong scattering signatures at 85 GHz.
 The frequency distribution of EOF amplitudes for observed clouds do show some common traits among the four oceanic regions. In each case, for small ∣a1(o)∣ there is a positive correlation between the amplitude of the first and second EOFs. Globally, of course, the correlation between first and second EOF amplitudes must be zero. But local correlations can exist in EOF space. For small departures from the mean convective cloud structure (i.e., for small x), the amplitude of the second EOF plays an important role in determining the vertical cloud structures. The clouds seem to be tall with some precipitation-sized high-density ice (negative y values) or shallow clouds with mostly rain or tall clouds with rain below low-density ice (positive y values). As x increases to large positive values for the amplitude of the first EOF, the correlation gives a1(o) and a2(o) becomes negative. This implies that as convective clouds contain more rain content, they must also contain larger amounts of precipitation-sized high-density ice.
4.1.2. Simulated Distributions Within Their Own EOF Space
 The dotted contours in Figure 8 outline the frequency distribution of each simulated database using its own EOF space. Even within their own EOF space, the manifolds of the frequency distribution are quite different, suggesting that a robust simulation manifold for Bayesain-based retrievals using cloud-radiative model databases would require numerous simulations over many different environmental conditions. Based on this presentation, it is clear that the KWAJ simulation lacks diverse convective clouds with high rain contents as the dynamic range of the first EOF is considerably smaller than that found in KAMP and TOGA. KAMP, on the other hand, exhibits more frequent negative high-amplitude first EOFs than the other simulations. This part of the distribution would be consistent with weak convective clouds that contain lower water contents and no ice-scattering signatures. Indeed, the KAMP simulation produced more isolated small showers in the model domain than did the squall lines produced in the KWAJ and TOGA simulations. In contrast to both KWAJ and KAMP, the TOGA database produced a lobe of moderate to strongly negative amplitudes of both the first and second EOFs. This part of the parameter space would be consistent with deep clouds containing low rain contents but some scattering signatures. The transition zone [Biggerstaff and Listemaa, 2000] part of mature-stage symmetric squall line systems is a good example of where such hydrometeor profiles could be observed. Indeed, the TOGA simulation produced the most classic symmetric leading-line trailing-stratiform structure [Houze et al., 1989] of all the simulations.
4.1.3. Simulated Distributions Within the Observed EOF Space
 Regardless of which observational EOFs the simulated radiance vectors are projected onto, there are considerable differences in the manifolds of each simulation database. Indeed, the observational radiance indices in EOF space can serve as a fingerprint of cloud structures needed to validate simulations of cloud systems.
 In comparing the frequency distributions between the simulated storms and the observed database, it is apparent that the locations of maximum frequency and the boundaries do not match well. While the observational data exhibit a maximum frequency near the center, the simulation data sets either lack a distinct peak or portray a peak at an erroneous position. This means that the models do not capture faithfully the local characteristic precipitation patterns and their occurrences. A mismatch of this type between occurrences in observations and simulations could significantly affect retrievals based on the Bayesian retrieval algorithm that use cloud-resolving model data sets since the algorithm weighs all simulated TBs surrounding a given observational TBs based on the distance between a given observed TB vector and simulated TB vectors or searches all simulated TBs based on a maximum-likelihood approach [Seo et al., 2007c].
 The mismatch between the observed and simulated manifolds is also evident in the regions of EOF space that are under- or overrepresented. For example, both KWAJ and KAMP overrepresent the distribution of clouds with some liquid water content but little to no ice-scattering signature (large negative values of a1(s) coupled with positive values of a2(s)) such as developing convective clouds or weak convective showers. Yet, there is an underrepresentation of high liquid water content clouds having an extremely large amount of precipitation-sized high-density ice that are observed in NWP (large positive a1(o) and large negative a2(o)). The over- and underrepresentation in different parts of the EOF space affects the overall variance, which is examined in more detail in section 4.2. The tendency for the simulations to overproduce graupel for a given rain content is also evident as the narrowness and steep negative slope of the simulated frequency distributions at moderate to large first EOF amplitudes of the simulation does not match well the distribution for corresponding amplitudes of the first EOFs in the observations. The greater negative correlation between the first and second EOFs in the simulations at moderate to large values of first EOF implies that the simulations produce excessive high-density ice and less rain that could result in high TMI-derived rain rates based on scattering signatures [Seo et al., 2007b].
 Despite the mismatches between the observation and simulated frequency distributions, it should be noted that the TOGA simulation produced a distribution that agrees in character to the observed frequency distributions in EP. Missing, however, from the simulated database is the notch in the frequency distribution contours that appear in EP. Thus, the moderately strong convective rain contents in the TOGA simulation are not dominant by optically thin clouds as observed in EP.
4.1.4. Comparison of Frequency Distribution in Nonconvective Clouds
 For both mixed and nonconvective (categories 2 and 3) classifications, the tropical oceans observed frequency distribution of EOF coefficients in their EOF space (Figures 9 and 10) was quite similar. The only difference of significance is found in EP where, for small ∣a1(o)∣, the correlation between the first and second EOF amplitudes is positive while the other oceanic distributions were less strongly or nearly completely uncorrelated. This suggests that even mixed and stratiform classifications with low rain contents in EP consisted of clouds with less variability than the other oceanic regions.
 In terms of overlapping manifolds, the simulated and observed databases agreed best for the mixed echo classification (Figure 9) with two notable exceptions. First, the KAMP simulation continued to overrepresent clouds with some liquid water content but little high-density ice content. Seemingly, all the simulations had higher frequency of extreme water and ice contents than the observations. This latter point is particularly pronounced for nonconvective (category 3) classification (Figure 10) where the simulations grossly overrepresent extreme hydrometeor content profiles. Indeed, the displaced maximum frequency in the TOGA simulation, as compared to observed nonconvective clouds, clearly illustrates the impact of high graupel content in simulated stratiform clouds, fundamental characteristic in three ice class parameterization schemes as noted by Biggerstaff et al. .
4.2. Quantitative Comparisons
 In the previous section, the manifolds of the amplitude of the first and second EOFs were compared in a qualitative manner. Here we consider a quantitative analysis of the variance of the amplitudes of the simulated perturbation radiance vectors projected onto each of the observational EOFs, normalized by the variance of the observed perturbation radiance vectors projected onto the same EOFs. Since much of the variance of the radiance index vector is explained by the first EOFs, such an analysis provides insight to the relative importance of simulated variability to observed regional variability in statistical retrieval methods. It should be noted that the first EOF is virtually identical across oceanic domains and across the simulations for each cloud classification category. Hence, the regional variability is better defined by the variability with respect to the second EOFs.
 To illustrate the results of the qualitative analysis, ratios of the projected variances of the simulations to the observational variances were calculated. The variance ratio is defined as
where ci and νi denote, respectively, the variance of a given simulation projected on the ith EOFs (ei(o)) and a given observational variance of the ith EOF. Given the shape of the first EOF, i = 1 can be interpreted as denoting comparatively deep clouds with moderate to high rain content and moderate to large scattering signature, and i = 2 can be interpreted as clouds with weak to moderate amounts of rain with little ice-scattering signature or clouds consisting mostly of ice with weak scattering signatures. Note that the departure of variance ratios (r1 and r2) from unity generally corresponds to large uncertainties in the retrievability of cloud structure. Thus, this approach helps quantify the applicability and plausibility of the predefined cloud-radiation databases for satellite microwave retrievals.
4.2.1. The Importance of Simulated Databases
 The left column of Figure 11 shows the values of variance ratios for projections onto the first EOF of each observational region partitioned by observational domain and cloud classification category. Since the first EOFs of the observations are essentially identical, most of the variability is due to differences in the simulations. For example, the predicted retrievability using the KAMP simulation is much worse than the predicted retrievability for the other simulations. In part, this suggests that the environmental condition in the simulation is not reflective of the oceanic domains for which the observations were taken. The KWAJ and TOGA simulations are a better match. Accounting for the proper environmental condition in the simulated database is more important here than the inherent regional variability between the western and eastern Pacific Oceans.
 The importance of simulated database characteristics is further illustrated by the distinct model dependence in variance ratios for category 3 cloud classification (Figure 11c). Here, the KWAJ simulation produced very low variance ratios, while the TOGA simulation produced very high variance ratios. This result is directly related to the magnitude of the variance in the radiance vectors for category 3 classification across the simulations. The KAWJ simulation had the lowest variance, similar in magnitude to the observed variance. Given that the perturbation radiance vectors were small (i.e., most profiles close to the mean), the projection of the perturbations onto the first EOF yielded small values with little overall variance. In contrast, the TOGA simulation had four times the standard deviation of the KWAJ simulation in the distribution of radiance vectors in category 3 clouds. This large variability, inherent in the model database, led to large variances in the amplitudes of the projected perturbation radiance vectors onto the first EOF.
 The impact of model variability noted for categories 1 and 3 classifications is evident in category 2 classifications as well. Indeed, it appears that the mixed cloud classification exhibits the worst characteristics of each of the primary cloud types resulting in poor predicted retrievability.
 Before discussing the variance ratios for projections onto the second EOF, it is important to note that the predicted utility of the KWAJ simulation relative to the KAMP simulation is somewhat misleading. Recall from section 3, that the manifolds of the KWAJ frequency distribution of EOF coefficients in EOF space provided areas of over- and underrepresentation (Figure 8a). The combination of both over- and underrepresenting part of the EOF parameter space resulted in relatively low variance compared to the other simulations. Hence, to thoroughly examine the suitability of a simulated database for microwave retrievals, the spatial distribution of the EOF coefficient manifolds along with some measure of the alignment of the frequency distribution in EOF space is required.
4.2.2. Importance of Regional Variability
 Since the first EOFs are essentially identical, the regional variability across the oceanic domains is contained in the second EOF. Using the second EOF as the basis of the projection in this analysis should extract a quantitative measure of the importance of the regional variability to the simulated variability in the retrieval process. Indeed, it appears that regional variability is about as important as the variability in the simulation for categories 1 and 2 classification. The difference in magnitudes of the variance ratio for a given simulation across the different observational regions is about as large as the difference in variance ratio for a given domain due to the various simulations. A hint of the regional variability is evident in the first EOF projections for category 1 as well, for KWAJ or TOGA, which are more suitable simulations than KAMP. For category 3, nonconvective clouds, the regional variability is much less important as the simulations are inherently poor at simulating stratiform clouds.
 In general, for all cloud categories all simulations show relatively large differences from unity in the r1 and r2. As noted, this is partially related to overrepresentation and underrepresentation in the manifolds of EOF coefficients in EOF space and partially related to different structures in the frequency distribution in EOF space. When projecting the perturbation radiance vectors onto the first EOF, the impact of the difference in the simulations seems to be much larger than the impact of regional variability in the observations. This suggests, therefore, that the simulation databases used in microwave retrievals should be prepared by considering their suitability relative to the regions in which they are applied. For example, for deep convective clouds with moderate to large rain content and moderate to large scattering index, a single simulation with appropriate environmental conditions can satisfy all the ocean regions considered here since the regional variability shown is not large compared to the simulation variability. On the other hand, for shallow warm-rain events and for tall clouds with low rain content and weak to moderate scattering index, a chosen simulation cannot satisfy all the oceans since the regional variability shown is comparable to the simulation variability.
 In general, Bayesian rain retrieval algorithms use predefined data that may be based on simulations of convective storm systems from cloud-resolving models to relate observed brightness temperatures (or radiance indices, Is) to rain rate. The simulated database is implicitly assumed to represent all possible cloud systems. The accuracy of retrieved rain rate could be strongly affected by discrepancies between observed distributions of Is and the simulated distributions of Is. In this study, a technique based on EOF analysis was developed to illustrate both qualitative and quantitative methods that could be used to evaluate how well simulated databases represent the variability of naturally occurring clouds. The technique was demonstrated by comparing the observed regional variability of Is over four tropical oceanic regions to data sets from three numerical simulations of tropical oceanic convective storm systems. The simulations were chosen to represent much of the spectrum of storm types that occur over the tropical oceans and to illustrate fundamental concerns regarding the construction of model databases.
 For radiometer fields of view that consist mostly of convection (category 1), an essential criterion to representing observed clouds was to select an appropriate thermodynamic environment that is consistent with the environment over which the retrieval is to be applied. While all three simulations used soundings taken from the environments of observed oceanic convective systems, the simulation based on the Keys Area Microphysics Project off Florida produced a much broader range of convective cell strength due to the greater instability as compared to the other soundings. When the simulated manifolds of Is were projected onto the EOF space of the observational Is, it was clear that the KAMP simulation produced numerous convective clouds with modest amounts of liquid water but little ice-scattering signature that had no equivalent in the observed database. The resulting offset in the I manifolds in EOF space led to large values of variance ratios that illustrated the lack of suitability of that simulation for retrieving cloud structures in the ocean areas considered here. Taking a more suitable environment into account, as in the KWAJ simulation, allowed the regional variability between the east and west Pacific to be diagnosed in the EOF analysis. The regional variability was particularly well documented in the variance ratios when the perturbation radiance indices were projected onto the second EOF, as the regional variability was almost fully contained in the second EOF structure. Both category 1 and category 2 variance ratios showed that regional observed variability was just as important as the selection of the simulated database in the retrieval process.
 For radiometer fields of view that consisted of nonconvective rain (category 3), the EOF analysis illustrated a fundamental flaw in the character of the simulated clouds. Namely, stratiform rain in bulk-3 class ice parameterization schemes is associated with melting graupel rather than melting snow. High-density graupel produces a strong scattering signature in the nonconvective category that is not present in the observed I manifolds. This limitation in the clouds microphysical parameterization was well documented by Biggerstaff et al.  and Seo and Biggerstaff . While all the simulations overrepresent graupel for the amount of rain present, the TOGA simulation had the highest frequency of occurrence consistent with the larger stratiform rain region as compared to the other simulations. Consequently, the TOGA simulation had the highest value of variance ratios, which reflect the offset in simulated I manifolds projected onto the observed EOF space and the large variance of the simulated Is. The variability in variance ratios suggests that for category 3 rain the choice of model database is more important than any physical regional variability in observed Is. Given that category 3 represents more than 70% of the raining fields of view, the rather poor performance of the simulated databases illustrates how limits in microphysical parameterizations may affect cloud-resolving model-based Bayesian retrievals.
 Another limitation of the simulations is the tendency to overrepresent some types of clouds and to underrepresent others. The combination of over- and underrepresenting portions of the EOF space helped reduce the variance ratio associated with the KWAJ simulation making it seem that KWAJ was the most suitable simulation to use for the given observational databases. In reality, the mismatch in EOF manifolds would make KWAJ a rather poor simulated database to use for retrievals. The EOF analysis of KWAJ illustrates the importance of conducting both qualitative comparisons in matching the frequency distribution of EOF coefficients matching in EOF space as well as quantitative comparisons like the variance ratios. Investigating EOF patterns of Is alone was insufficient as the model databases had very similar structure in the first EOF and the KAMP and KWAJ simulations had very similar second EOFs. The similarity in I EOFs did not reveal the bias injected by the higher CAPE environment of the KAMP simulations. This bias was revealed when the simulated EOFs were projected onto the observed EOF space, producing a two-dimensional view of the I manifolds.
 Additionally, the analysis presented here demonstrates the difficulty in developing a simulated database that is capable of reproducing the broad range of observed clouds without overrepresenting a portion of the I manifolds. In particular, a fundamental requirement appears to be that simulations be conducted with similar thermodynamic environments to the regions over which the retrievals are performed. Perhaps more importantly, greater sophistication is required in the microphysical parameterizations to remove the discrepancy between surface rain and graupel content between simulated and naturally occurring stratiform rain. While analysis of the passive microwave signatures are useful, it is clear that observational data sets with simultaneous polarimetric radar, observed wind fields, and passive microwave measurements will afford the greatest opportunity to test new microphysical schemes needed to reduce the limitations inherent in current cloud-resolving models that affect retrievals of cloud properties from satellite observations.
Appendix A:: Cloud Hydrometeor Profiles in EOF Space
 To provide insight regarding the physical interpretation of the EOF analysis conducted here, localized mean profiles of cloud hydrometeor properties for different portions of the EOF space are presented in Figure A1. The profiles include the local mean perturbation around the point in EOF space added to the average profile from the simulation for each cloud classification category.
 In the example shown, the KAMP simulated perturbation radiance indices were projected onto the EOFs of the EP observed radiance indices. The black solid contours represent the frequency distribution of the EP observations. The dotted line represents the outline of the 0.001 frequency contour for the KAMP simulation EOFs, not projected onto the EP radiance vector, while the shaded solid contours show the KAMP frequency distribution projected onto the EP observed radiances. This arrangement is equivalent to that used in Figures 8, 9, and 10. Although the example here uses KAMP and EP, for the most part, similar profiles would be found for other projections.
 Point C in Figure A1 is for areas with nearly zero magnitude first and second EOF coefficients. Hence, column C closely represents the average cloud structure from the KAMP simulation. The convective clouds in KAMP contained, on average, 0.5 g m−3 rainwater content up to the freezing level. Above the freezing level, the average structure consisted of a deep layer of graupel with little snow. Cloud ice concentrations increased linearly from zero at the freezing level to about 0.4 g m−3 at 11 km. The mixed cloud classification has nearly the same structure as the convective classification, but with only ∼40% of the magnitude of the hydrometeor mass contents. The stratiform classification, on the other hand, has a unique structure with less than 0.1 g m−3 average rain and cloud water contents below the freezing level and little precipitation size ice aloft. It should be noted that the KAMP simulation produced a small stratiform area. Projections of TOGA on EP produced a mean profile that contained considerable graupel content aloft (not shown).
 If we consider just the profiles that have strong first EOF coefficients with near zero coefficients for the second EOF, denoted by the letter E in Figure A1 and recall the structure of the first EOF of EP (Figure 5a), we see that these cloud are associated with significant rain contents and moderately high contents of graupel. This structure, proposed in section 3.1.1, is consistent for all echo classifications, though the amount of the mass content is much greater in the convective classification than in mixed and stratiform classification. In contrast, column D shows the profiles that have large negative values of the first EOF coefficients. These types of clouds have limited liquid water and almost no ice content, which would be consistent with the developing stages of a convective cell where the cloud water and rain water contents are nearly equal.
 If we consider only the second EOF by focusing on distributions along the y axis and examine profiles associated with positive EOF coefficients denoted by point F, we find category 1 and category 2 profiles that contain above average amounts of liquid water and below average amounts of ice content aloft. Indeed, this is the closest structure that the model produces to warm-rain clouds that were stated in section 3.1.1 to be consistent with this type of EOF pattern. The inverse EOF pattern, with negative second EOF coefficients (column G), is consistent with the dissipating cloud structures that contain significant quantities of cloud ice and graupel but relatively little rain near the surface.
 Considering category 3, in which we proposed that the second EOF was likely more associated with stratiform rain, profiles around point F illustrate how the 85 GHz attenuation index could become saturated yet be associated with no ice-scattering signature. Namely, these profiles contained modest amounts of liquid water and no ice aloft. Thus, the EOF pattern saturates at the 85 GHz attenuation index and has warmer than average 85 GHz scattering signature. This structure is the closest that simulation has to real stratiform clouds that contain liquid water with little to no high-density ice aloft. The inverse profiles (negative second EOF coefficients shown in column G of Figure A1) are more indicative of modeled stratiform rain in which the rain is produced by melting graupel, and the amount of rain decreases toward the surface due to evaporation in subsaturated downdrafts [Zipser, 1977]. Thus, the explanation given for the saturation at the 85 GHz attenuation index, namely that the TMI pixels likely have weak average stratiform rain rates is consistent with the profiles illustrated here.
 The frequency distribution in EOF space shows that a number of simulated profiles contain large positive first EOF coefficients and moderately negative second EOF coefficients, denoted by A in Figure A1. This combination of EOF patterns is associated with slightly above average rain and significantly above average graupel. Indeed, even category 3 clouds in the part of the EOF space appear to be more like convective clouds than stratiform clouds. The KAMP simulation contained several isolated cells along the northern periphery of the convective system. Given our echo classification scheme, some of the category 3 pixels in the KAMP data set would have been associated with a mix of isolated convection and echo-free air. The “convective cells mixed with clear air” type of category 3 pixel populates this portion of the EOF parameter space. While the frequency contours show that the simulation projected onto EP had several such profiles, the frequency contours for the observed category 3 classification show that very few profiles with this structure existed in the observed database.
 The KAMP simulation produced a number of profiles in which the first EOF was moderately negative and the second EOF coefficient was slightly positive, denoted by B in Figure A1. This portion of the EOF parameter space contains profiles with low mass contents that are dominated by a mixture of cloud water and cloud ice with a little graupel aloft. For category 1, this type of profile is consistent with tall developing convective clouds similar to those near point D, only here the mass contents are higher. For category 3, the structure is more similar to a weak stratiform rain. Interestingly, the observed EP data set had no discernable concentration of profiles in that part of the parameter space for category 1 but did have some for category 3.
 In summary, the physical interpretations of the EOF patterns presented throughout the paper are consistent with the types of hydrometeor profiles found within the EOF parameter space. Nevertheless, given the time and space scales of convection, the size of the TMI field of view and the lack of sensitivity of cloud ice and low-density snow on the TMI frequencies [Biggerstaff et al., 2006], more than one interpretation of a particular EOF pattern may still be possible.
 TRMM data were provided by the Goddard Distributed Active Archive Center. The first author was supported by the National Science Foundation in the United States of America under grants ATM-0619715 and ATM-0802717. The corresponding author was funded by the Korea Meteorological Administration Research and Development Program under grant CATER 2009-2108.