Ice-cloud particle habit classification using principal components



[1] A novel automatic classification method is proposed for identifying the habits of large ice-cloud particles and deriving the shape distribution of particle ensembles. This IC-PCA (Ice-crystal Classification with Principal Component Analysis) tool is based on a principal component analysis of selected physical and statistical features of ice-crystal perimeters. The method is developed and tested using image data obtained with a Cloud Particle Imager, but can be applied to other silhouette data as well. For three randomly selected test cases of 222, 200, and 201 crystals from tropical, midlatitude, and arctic ice clouds, the combined classification accuracy of the IC-PCA is 81.1%. Since previous, semiautomatic classification methods are more time-consuming and include a subjective phase, the automatic and objective IC-PCA offers a notable improvement in retrieving the shapes of the individual crystals. As the habit distributions of ice-cloud particles can be applied to computations of radiative impact of cirrus, it is also demonstrated how classification uncertainties propagate into the radiative transfer computations by using the arctic test case as an example. Computations of shortwave radiative fluxes show that the flux differences between clouds of manually and automatically classified crystals can be as large as 10 Wm−2 but also that two manual classifications of the same image data result in even larger differences, implying the need for a systematic and repeatable classification method.

1. Introduction

[2] Tropospheric ice clouds, hereafter cirrus, cover a considerable fraction of the surface of the Earth both spatially and temporally and, therefore, affect the climate locally and globally through microphysical and radiative processes. One of the most important global impacts is the contribution of cirrus to the radiative energy budget of the Earth-atmosphere system. However, quantitative estimates on the impact are difficult to establish because currently both modeling and measurement techniques suffer from large uncertainties related to the microphysical properties of ice cloud particles.

[3] Ice clouds differ from water clouds in that they are composed of ice crystals of varying shapes instead of spherical water droplets. The shapes of the crystals should not be ignored when determining the total mass and the optical thickness of the cloud but shapes are especially crucial when determining how single ice crystals interact with electromagnetic radiation [Macke et al., 1996; Takano and Liou, 1995; Um and McFarquhar, 2007, 2009, 2011; Hong et al., 2009; Yang and Liou, 1996, 1998; Yang et al., 2005]. Indeed, the radiative impact of cirrus depends on the single-scattering properties of the ice crystals that depend, in addition to shape, on the relative size of the particles with respect to the wavelength of light.

[4] Observing single ice crystals and measuring their size and shape distributions within a cloud is challenging but can be carried out by aircraft-mounted instruments such as the Stratton Park Engineering Company's Cloud Particle Imager (CPI). The maximum diameter of the ice cloud particles typically ranges from 10μm to several mm [Dowling and Radke, 1990]. CPI images from flight campaigns in the tropical [Noel et al., 2004; Connolly et al., 2005; Um and McFarquhar, 2009], midlatitude [Baker et al., 2006; Heymsfield et al., 2002; Um and McFarquhar, 2007; Mauno et al., 2011], and arctic regions [Korolev and Isaac, 1999; Lawson et al., 2001; Korolev and Isaac, 2003; McFarquhar et al., 2007, 2011] reveal that large (maximum diameter ≥100 μm) ice crystals in cirrus clouds show a variety of shapes, referred to as habits, ranging from single hexagonal columns, bullets, and plates to regular and irregular aggregates. There are also a large number of small crystals that often appear quasi-spherical [Nousiainen and McFarquhar, 2004; Nousiainen et al., 2011]. The proportion of each shape depends on the prevailing meteorological conditions, temperature and vapor content, and therefore often varies between clouds and within a cloud.

[5] Since Weickmann [1947]first obtained images of ice crystals in cirrus, there have been numerous attempts to image and classify shapes of cloud particles using aircraft measurements and ground-based observations. Early studies focused on images acquired at high altitude ground sites [Nakaya, 1954; Zamorsky, 1955; Magono and Lee, 1966]. After the development of Optical Array Probes (OAP) for aircraft platforms, large databases on cloud particle shapes were obtained, to which several new classification techniques were applied. Rahman et al. [1981]developed a technique for extracting features of two-dimensional binary images of ice particles and raindrops. From such features derived from OAP images,Hunter et al. [1984] developed a classification algorithm that distinguished six categories of ice crystals. Holroyd [1987] then suggested a technique using observed properties (i.e., size, linearity, area, perimeter, and image density) of particle images to classify asymmetric ice particles into nine categories. Moss and Johnson [1994] used a pattern recognition technique to identify seven cloud particle categories, Garbrick et al. [1995] and McFarquhar et al. [1999] have applied a neural network technique to images of ice crystals to identify shapes, and Fouilloux et al. [1997]classified particles based on a statistical analysis of small hexagonal columns and spherical particles imaged by a two-dimensional OAP.Korolev and Sussman [2000]developed an algorithm for four categories based on an analysis of dimensionless ratios of simple geometrical measures. Classification schemes based on high-resolution images obtained by a CPI have also been developed [Lawson et al., 2006; Um and McFarquhar, 2009].

[6] This study introduces an automatic habit classification tool based on pattern recognition for efficient and consistent processing of CPI image data to be used for generating shape distributions of observed ice cloud particles. The classification tool is based upon a detailed analysis of different parameters that describe the two-dimensional shapes of ice crystals, from which criteria for classifying those shapes can be derived. Through testing, all parameters of the classifier are ultimately fixed for future applications. The remainder of the paper is organized as follows. Ice crystals and their typical habits are presented inSection 2. In Section 3, the statistical theory of classification using pattern recognition is presented together with an introduction to various features that can be measured from the CPI images. The actual shape classification is presented and discussed in Section 4. In Section 5, the work is summarized, key findings are emphasized and conclusions are drawn while also considering relevant applications and the significance of this study. An example application of the retrieved habit distribution for computing the radiative properties of an arctic ice cloud is presented in the Appendix, where the flux differences resulting from manually and automatically classified data are quantified.

2. Ice Crystal Silhouettes

[7] Single ice crystals in tropospheric cirrus clouds can be imaged using a CPI instrument mounted on an aircraft. Crystals that pass through the sample volume of the CPI are individually illuminated and imaged, resulting in an image with a dark, or partially translucent, silhouette on a brighter background. CPI images show two-dimensional projections of real ice crystals, not the three-dimensional shapes that are required for radiative transfer calculations. In some cases, one cannot avoid ambiguous interpretations when estimating these shapes. The shapes of small ice crystals are especially difficult to define due to the blurring caused by diffraction and the limited resolution, 2.3μm, of the CPI instrument [Um and McFarquhar, 2011], and shattering effects. Therefore, we concentrated on the habit classification of large ice crystals that have a maximum diameter Dmax above 100 μm.

[8] Although the shapes of natural ice crystals vary extensively, certain habits are recognizable and occur in cirrus clouds at all latitudes. In a recently published review, Baran [2009]summarizes the existing shape models of ice crystals: hexagonal columns, plates, chain aggregates, bullet rosettes, and bullet rosette aggregates; polycrystals, quasi-spheres, droxtals, Chebyshev particles, and particles with inhomogeneous composition. Since the actual variety of shapes is larger [Bailey and Hallett, 2009, 2012], several other habits, such as capped columns and budding rosettes have also been identified. We built the classification system at this point based on eight common habits: single plates, bullets, columns, irregular crystals, rosette aggregates, bullet rosettes, plate aggregates, and column aggregates. An example crystal of each class is shown in Figure 1. Other habits are more rare in nature and, therefore, the necessity of adding them as separate classes to this classification can be reconsidered in the context of future applications or observations.

Figure 1.

Sample ice crystal silhouettes that represent the different habit classes: 1) plate, 2) bullet, 3) column, 4) irregular, 5) rosette aggregate, 6) rosette, 7) plate aggregate, and 8) column aggregate.

3. Classification Methods

[9] Classification of objects into separate classes based on observations or measurements is ultimately a pattern recognition problem where the goal is to automatically identify certain features, a pattern, that indicates the correct class for the object [Webb and Copsey, 2011]. A pattern recognition system generally has two distinct parts: a feature extractor and a classifier. The feature extractor includes processing of the observation vectors x, typically in a way that the dimensions (dim) are reduced so that the classifier only needs to work with feature vectors y (dim(y) < dim(x)). Here, we use principal component analysis (PCA), which is a well-established and commonly used statistical technique, for performing the dimension reduction, and the Bayesian probability andk nearest neighbors approaches as classification techniques [Webb and Copsey, 2011].

[10] In practice, the pattern recognition system is constructed based on known samples, called training data, that consist of the feature vectors and the true, known classes of the objects. The performance of the pattern recognition system is then tested with the feature vectors of separate, independent test data.

[11] In PCA, the observation vectors xT of the training data are put together as an N × din matrix XT where N is the number of objects and din the number of observations per object, i.e., the initial value of dimensions. A spectral decomposition exists for the correlation matrix corr(XT) = VΛVT from which we can solve for the eigenvalues Λ and the corresponding eigenvectors V. Therefore, the principal component transform for the training data XT is YT = VTXT, and the j principal components are YT,j = vjTXT. Dimension reduction is done by choosing only d eigenvectors that correspond to the largest eigenvalues Λ = diag(λ1λ2,…, λd) where λ1 ≥ λ2 ≥ … ≥ λd ≥ 0. In this way, most of the information is preserved in the reduction of dimensions. A new observation x requiring classification is first transformed into this principal component space with the eigenvector matrix Vd = [v1T … vdT]: y = VdTx. After this, classification techniques such as the k nearest neighbors (kNN) or Bayesian probability can be applied.

[12] The classification can be done simply by measuring the d-dimensional distance ofy from all the training data points YT,d with known classes c. In the kNN technique, the class of y is determined by the classes of the k nearest points. The choice can be weighted by the inverse of distance. In this study, we considered values of k = 1, 3, and 5 (referred to as 1NN, 3NN, and 5NN), the latter two both with and without weighing.

[13] The Bayesian probability pB for the vector y to belong to a class c with an average μ and a covariance matrix C is

display math

where ppri is the a priori probability for the class c and d the number of dimensions. The observation is classified based on the highest value of pB.

[14] The most challenging task in statistical pattern recognition is to choose such features from the objects that are sufficiently similar within a class and sufficiently distinct from other classes. As a starting point for the selection of such features, several physical measures were directly extracted and calculated from the CPI images of the ice crystals:

[15] 1. the perimeter of the silhouette,

[16] 2. the area A of the crystal,

[17] 3. the maximum diameter Dmax (see Figure 2, upper right panel),

Figure 2.

(a) Original CPI image of a column aggregate. (b) The detached perimeter of the silhouette. The convex hull is demonstrated with a dashed line, and also maximum diameter Dmax and width Wmax are shown. (c) Automatically detected corners. The corresponding perimeter is shown with a dashed line. (d) The line segments s (shown in green) and the surface normal angles. Both are functions of the invariant angle γ. The pink line depicts an invariant angle of ϕ = 90°, so this figure demonstrates the calculation of the autocovariances for γ = 30°, ϕ = 90°.

[18] 4. the maximum width Wmax that is perpendicular to Dmax (also in Figure 2, upper right panel),

[19] 5. the translucent area Atr inside the perimeter.

[20] The values of these measures were not systematically different or similar for crystals of different habits, so the classification could not be based on these. Instead, we created sets of parameters that are derived from these extracted values. They can be categorized as area, aspect ratio, and perimeter shape -related, and are introduced in more detail below.

[21] Some ice crystal silhouettes are convex, meaning the tangent of the perimeter does not intersect with the perimeter at any other point, while some have large concavities in the perimeter. Thus, the ratio A/Ach, where Ach is the area of the convex hull of the perimeter and illustrated in Figure 2 upper right panel, was a potential parameter for the classification. The crystals also show major variation in the aspect ratio, which can be quantified for example using the following dimensionless parameters: the ratio of the silhouette area to the area of a specific circle (i.e., math formula or math formula), the ratio of the silhouette area to that of a rectangle (A/(Wmax · Dmax)), or the ratio of the two maximum dimensions (Dmax/Wmax). Hence, these were also identified as candidates for classification parameters.

[22] The perimeter of a crystal is discretized such that the distance of two adjacent points along the perimeter is constant. As explained in [Lindqvist et al., 2010], we normalized the perimeter to 360 degrees, and defined an invariant angle γ that corresponds to a certain proportion of the entire length of the perimeter. For instance, γ = 180 degrees corresponds to 50% of the total length of the perimeter. It is emphasized that γis measured along the perimeter and not as an angle from any specific origin because this simplifies the treatment of non-starlike silhouettes, for which the latter method would provide ambiguous results. The concept of an invariant angle allowed us to study the statistical characteristics of the shapes: the perimeters of the ice crystals can vary notably even within the same habit but statistical parameters should not be that sensitive to stochastic differences. Similarly toMuinonen [2006], we can define a line segment s(γ) as the Cartesian distance of two perimeter points separated by an invariant angle γ. Hence, the average of the length of the line segment math formula and its autocovariance cov(s(γϕ)) describe the perimeter statistically in only a small number of parameters. Here we have used γ = 30, 90, and 180 degrees, and ϕ = 0, 30, 90, and 180 degrees for the invariant angles, resulting in three math formula and 12 cov(s(γϕ)) parameters. As a plausible alternative to line segments, we also used the surface normal vectors by defining α(γ) as the angle between two surface normals of the perimeter points separated by an invariant angle γ [Muinonen, 2006]. The shape of the perimeter is then expressed by the average math formula and its autocovariance cov(α(γϕ)). In this case, the clearest separation between classes is achieved with invariant angles of γ = 5, 10, 30, 90, and 180 degrees, and ϕ = 0, 10, 30, 90, and 180 degrees, which leads to 30 parameters in total. The line segments and the surface normals are depicted in Figure 2, lower right panel, together with examples on different invariant angles.

[23] In addition to the previously described area, aspect ratio, and perimeter shape -related parameters, the number of cornersNcornerin the perimeter was computed using a corner-detection algorithm presented byHe and Yung [2008]. The algorithm is well-suited for this purpose since it takes into account both local and global curvature properties of the perimeter, thus allowing for some amount of rounding of corners that often tends to be the case in the CPI images, partly due to diffraction in the imaging process and sometimes partly due to the sublimation of the crystals. An example of ice-crystal corner detection is shown inFigure 2, lower left panel. With the detected corners, the perimeter can be modeled with vectors math formula as edge lines: their average length relative to the total length of the perimeter l and the standard deviation of the relative length σl have characteristic values for each crystal habit. In addition, the angles between the vectors hold valuable information of the crystal morphology revealing, for example, hexagonality and concavities. Some habits favor 90° angles while others prefer 120°, but in the case of the perimeters of real crystals, these angles vary somewhat due to the orientation of the crystal. Thus, the number of perimeter corners in the ranges of 0°–40°, 40°–75°, 75°–105°, 105°–140°, and 140°–180° was calculated. We also kept track of the number of convex and concave corners. Finally, the area of the polygon formed by the vectors is denoted by Acorner.

[24] All parameters introduced above are summarized in Table 1 and were identified as potential classification parameters. Later, different combinations of these were used in the PCA to determine the parameter combination that worked best in classification.

Table 1. Parameters Designed for Classification of Ice Crystal Shapesa
AreaAspect RatioPerimeter ShapeCorner Detection
  • a

    Each parameter is a scalar, if not otherwise indicated by the number in parentheses, which specifies the vector length.

  • b

    Parameters that best classify compact crystals.

  • c

    Parameters that best classify non-compact crystals.

A/Achb,cA/[π(Wmax/2)2]b math formula (5)Ncornerc
A/AcornerbA/[π(Dmax/2)2]cov[α(γϕ)] (25) math formulac
 A/(Wmax · Dmax)b math formula (3)b,cσlc
 Dmax/Wmaxcov[s(γϕ)] (12)b,cangles between ūi (7)c

4. Results and Discussion

[25] Construction and testing of the automatic habit classification system involved three phases. In Phase 1, the classes were created by assembling the training data. In Phase 2, different combinations of potential classification parameters were studied with the PCA to identify the combinations that were good at separating the habits. In Phase 3, with the established combination of parameters, the optimal number of PCA dimensions and the best-performing classification technique (Bayesian or one of thekNN) were determined. This phase also included testing of the overall performance of the classifier for different data sets.

[26] In this Section, we first describe the training data that this classification is based upon. Then, we present the CPI data used in testing: one data set was used in Phase 2 defined above and three other sets were used in Phase 3. After presenting these data, Phases 2 and 3 are described in detail, ending up with a definition of the established classifier.

4.1. Training Data and Phase 1

[27] The ice crystals of the training data define the statistical properties of each class and the limits between classes. Therefore, criteria for selecting crystals for the training data were that the habit of the crystal had to be recognizable and representative of its class. Different types within each class were included to better cover the natural variation of crystals within a habit. This means that crystals of various aspect ratios, orientations, and, in case of aggregates, a varying number of constituents, were selected. To also cover the possible variation of crystals at different latitudes, the training data were selected from the CPI images from three different field measurement campaigns: the Tropical Warm Pool International Cloud Experiment (TWP-ICE) [May et al., 2008], the March 2000 Cloud Intensive Operational Period (Cloud IOP) [Dong et al., 2002; Mauno et al., 2011], and the Indirect and Semi-Direct Aerosol Campaign (ISDAC) [McFarquhar et al., 2011]. The first 150 crystals identified in each of the eight classes were used in the training data: 50 from arctic (ISDAC), 50 from midlatitude (Cloud IOP), and 50 from tropical (TWP-ICE) ice cloud CPI data, thus giving a training data of over 1000 ice crystals. Classes that did not have 150 crystals due to the lack of appropriate images were bullets (124), plates (103), and plate aggregates (138).

4.2. Test Data

[28] Two sets of test data were used, hereafter called the feature-test data and the actual test cases. The former set was used in Phase 2 to identify the best parameters for the classification, and the latter set in Phase 3 to test the classifiers and the optimal number of PCA dimensions to be used. The sets were independent and separate from the training data: if the training data had been used in Phases 2 and 3, the tests would have resulted in 100% classification accuracy; likewise, using the same data in Phases 2 and 3 would not have been an objective test for the performance of the classifiers.

[29] The feature-test data in Phase 2 consisted of a randomly chosen, 30-second period of CPI data from the ISDAC campaign. Since these data did not include samples from all eight habits, 17 extra crystals from TWP-ICE data were added to cover all habit classes. The feature-test data consisted of 150 crystals in total.

[30] Three actual test cases in Phase 3 were chosen because the classification system should be effective and reliable regardless of cirrus formation mechanism, latitude, temperature, or other external factors. These test cases were from the different field projects and consisted of all the crystals observed by the CPI within a short time, namely one or several minutes, depending on the number of crystals observed within that time frame. Detailed information about the times of the test cases is presented in Table 2. The crystals that were only partially imaged were removed from the data. In addition, if the contrast between the silhouette and the background was too low, the perimeter could not be correctly extracted and the image was removed. The removed images are not included in the numbers in Table 2.

Table 2. Selected Details of the Three Test Cases
CampaignLatitudeDateTimeLarge Crystals
Cloud IOPMidlatitude13.03.200019:49–19:51200

[31] All of the test crystals were first manually and independently classified by two authors as one of the eight shape classes. In some cases, where the habit of the crystal could not be unambiguously determined, two classes equally well described the crystal habit. This mainly occurred for some roundish plate aggregates that could be counted as irregular crystals, but also for some rosettes that had additional crystals attached to them, which made them suitable column aggregate candidates as well. It was also subjective how non-pristine a plate or a column crystal needed to be to be considered an irregular crystal.

4.3. Phase 2: Feature Selection

[32] When studying all potential classification parameters for the training data, it became obvious that some parameters worked better for certain classes and failed for others. For instance, parameters based on corner detection were valuable for separating rosettes from plate aggregates, but less useful for simpler shapes (e.g., bullets and columns) where one erroneous corner detection could lead to large errors. Therefore, the classification was executed in two parts. First, the crystals were divided into compact and non-compact morphologies. Bullet, plate, column, and irregular crystals were considered as compact shapes (top row inFigure 1) that were separated from the more complex habits, such as single rosettes, rosette aggregates, plate aggregates, and column aggregates, that are referred to as non-compact crystals (bottom row inFigure 1). The shapes of the compact crystals are typically rather convex; whereas, concavity is a common feature among the non-compact shapes.

[33] Figure 3illustrates how the compact and non-compact shapes can be separated based on the ratioA/Ach. Because certain non-compact shapes, especially aggregates of plates, tend to be nearly convex, the number of corners was also included as a criterion for separating the shapes. After careful testing, two conditions that must be met for compact crystals were identified: i)A/Ach > 0.90, and ii) Ncorner < 11. In Figure 3, nearly all (97.5%) of the black symbols that represent compact crystals are located within the region where the required conditions are fulfilled. Nevertheless, some plate aggregates still ended up being classified as compact crystals. Although their perimeter shapes can indeed be very similar to those of certain irregular crystals, the aggregates of plates are typically at least partially translucent, whereas irregular crystals are generally opaque and appear dark in the CPI images. An additional criterion was thus added to the classification. If the crystal is irregular but partly translucent (Atr/A ≥ 0.05), it is considered a plate aggregate (for example, the crystal 7b in Figure 1).

Figure 3.

The distribution of compact and non-compact crystals of the training data as functions of the two parameters used in separating the two morphologies. The box in the lower right corner marks the compact crystal region whereA/Ach ∈ [0.90, 1.0] and Ncorner ∈ [0, 11].

[34] In the second part of Phase 2, the compact and non-compact crystals were further divided into separate habit classes based on the principal component analysis and a classification technique (kNN and Bayesian) applied to a special set of features, as described in section 4. The Bayesian probability was calculated as presented in equation (1) by assuming equal a priori probabilities for each of the classes, and the k nearest neighbors was tested with k = 1, 3, and 5; the latter two both with and without weighing by the inverse of distance. All dimensions from d = 1 to the initial value d = din were considered (value of din varied depending on the parameter combination studied). The best feature set was determined by testing various combinations of the potential classification parameters listed in Table 1using the feature-test data: the best combination was identified as that having the highest classification accuracy. Overall, more than 20 different, logical feature combinations were tested for both compact and non-compact crystals. The best feature combinations were different for compact and non-compact crystals, as indicated inTable 1. For the best feature set of compact crystals, the total initial number of dimensions is din = 19 and, for non-compact,din = 26. Hereafter, these features are fixed as the basis for the classification system.

4.4. Phase 3: Classification of the Test Cases

[35] Once the classification features were set, we proceeded to test the classification techniques. For this, three test cases from tropical, midlatitude, and arctic ice clouds (reviewed in Table 2) were classified using all the techniques. The goal was to identify the best performer and the optimal number of PCA dimensions. Similarly to Phase 2, the results of the automatic classification for each test case were compared to the manually determined reference classes.

[36] In the tests, the performance of the classification techniques varied, depending on the test case, from 0% to nearly 100%: it was, in all cases, highly dependent on the number of dimensions d, i.e., principal components, taken into account, and on the classification technique. Classification accuracies achieved using the Bayesian and the weighted 3NN and 5NN techniques for all three test cases are shown in Figure 4 as a function of d. Results for the 1NN and the unweighted 3NN and 5NN techniques are not shown; their performance was almost constantly worse than that of the weighted 3NN and 5NN, exceeding the accuracy of those at only one value of din the arctic compact and midlatitude non-compact test cases. For compact crystals, the Bayesian and thek nearest neighbors techniques yielded very different results at the same number of dimensions. It is especially noteworthy that the accuracy of the Bayesian technique varies from 15% to 70% for d ≤ 6 and deteriorates to zero when increasing the number of dimensions, although the dimension reduction generally indicates loss of information and therefore would be expected to lead to a lesser accuracy at small values of d (which is seen for the kNN techniques). One explanation for the deterioration of the Bayesian accuracy is that equation (1)assumes that the PCA-transformed parameters would follow the Gaussian distribution; it is plausible that when considering multiple dimensions, this assumption becomes less valid. The classification accuracy for non-compact crystals appears to exceed 60% for all test cases and techniques when consideringd = 13; when increasing the number of dimensions, the accuracy is, for most cases, increased. However, non-compact arctic crystals are best classified with a small number of principal components (d ≤ 10). This could be due to the fact that this particular period of CPI images included a large fraction of very transparent crystals, which sometimes led to erroneously detected perimeters. Such shapes would not naturally belong to any of the habits.

Figure 4.

The performance of different classification techniques for the three test cases: tropical (pink), midlatitude (black), and arctic (blue) as a function of the number of principal components considered for (a) compact crystals and (b) non-compact crystals. The line type specifies the classification technique: solid for the 3NN, dash-dot with circles for the 5NN, and dashed for the Bayesian.

[37] The highest classification accuracies for each test case are summarized in Table 3, together with the technique and d with which they have been achieved. If the same accuracy was obtained for several different d, the result corresponding to the smallest d was chosen as the best. Results in Table 3indicate that if the classifier was optimized individually for these test cases, its accuracy would vary from 86.0% to 88.5%, combining compact and non-compact results for the three cases. Such optimization is, however, only possible when the classes are manually specified. Our goal was to determine the generally best-performing classification technique and the optimal number of dimensions. These were identified by summing the number of correctly classified crystals for each technique and for each value ofdfor all three test cases. It was found that the largest number of correct classifications was achieved, for both compact and non-compact crystals, by the weighted 5NN technique, and considering the 17 and 24 most significant dimensions for compact and non-compact crystals, respectively.

Table 3. The Highest Classification Accuracies for the Three Test Cases Together With the Classification Technique and Number of PCA Dimensions (d) Used for Achieving Them
Test DataCorrect, CompactTechniquedCorrect, Non-CompactTechniqued
Tropical37/46 (80.4%)5NN17154/176 (87.5%)5NN24
Midlatitude10/10 (100%)5NN17167/190 (87.9%)5NN25
Arctic49/61 (80.3%)3NN13128/140 (91.4%)Bayes4

[38] The habit distribution generated as a result of the classification is shown in Figure 5. The crystal compositions vary greatly for the three locations considered. The accuracy of this classification for each field project is somewhat lower than the best possible individual accuracies in Table 3; with 5NN, d = 17 for compact, and d = 24 for non-compact, the classification accuracies were 86.0% for tropical, 87.5% for midlatitude, and 69.2% for arctic test cases. The overall combined accuracy was 81.1%. The accuracies are quite high for all test cases, which signifies the versatility and good applicability of the automatic classification system, hereafter called the IC-PCA (Ice-crystal Classification with Principal Component Analysis). A summary of the IC-PCA classification procedure is provided inFigure 6.

Figure 5.

Ice-crystal habit distribution for the three test cases: arctic, midlatitude, and tropical.

Figure 6.

Classification procedure of a single ice crystal in the IC-PCA.

[39] Past studies [Field et al., 2003; McFarquhar et al., 2007; Lawson, 2011; Korolev et al., 2011] have shown that some ice crystals detected by in-situ microphysical probes may be shattered artifacts rather than naturally occurring ice crystals. In fact,Korolev et al. [2011] showed that some particles as large as 500 μm may be shattered remnants. The degree of shattering depends heavily on the shape of a probe's tips and inlets, as well as on the sizes and habits of particles, temperature, aircraft speed, attitude, pitch and roll. Since the shapes of the CPI tips and inlets differ from those of the probes used in the aforementioned studies and since the observations discussed in this study were obtained in different cloud conditions, it is not possible to apply the results of these past studies to estimate the proportion of crystals measured by the CPI that are shattered artifacts. Nevertheless, Um and McFarquhar [2011]suggested that multiple particles imaged in the same CPI frame were more likely remnants of a large particle shattering on the inlet than were single particles captured in one frame because shattered remnants have shorter interarrival times. They showed that over 98.4% of ice crystals imaged in tropical cirrus during the TWP-ICE were the only particles in the CPI frame, suggesting shattering was not a problem for the tropical data set. Similar comprehensive analysis has not been performed for the arctic and mid-latitude data sets. But, the habit classification scheme developed here is still germane for a couple of reasons: (1) the scheme can classify habits of detected particles regardless of whether the particles are shattered remnants or real ice crystals; (2) given that particles breakup upon impact with probe tips, it is likely that shattered artifacts would not be some of the pristine shapes that are used in the classification scheme; and (3) most shattered remnants will be smaller particles, and hence the selected cut-off criterion of 100μm based on the resolution of the CPI should remove most of the shattered remnants.

5. Conclusions

[40] An automatic ice-cloud particle habit classifier IC-PCA (Ice-crystal Classification with Principal Component Analysis) was developed to derive ice-crystal habit distributions from Cloud Particle Imager (CPI) images of single ice crystals obtained from atmospheric ice clouds in situ. For developing and testing the classifier, data collected during the ISDAC, Cloud IOP, and TWP-ICE field projects at arctic, midlatitude, and tropical environments were used.

[41] The IC-PCA classifies large crystals (Dmax ≥ 100 μm) into eight common habits: plates, bullets, columns, irregular shapes, rosette aggregates, bullet rosettes, plate aggregates, and column aggregates. Smaller crystals are automatically excluded from the classification. The IC-PCA classification procedure for a single image, summarized inFigure 6, is more efficient and objective than manual classifications and is perfectly repeatable. With the IC-PCA, an average classification accuracy of 81.1% was achieved for test cases that consisted of 623 crystals in total.

[42] At present, the IC-PCA assigns a single habit to each crystal. However, there are crystals that cannot be unambiguously identified as one specific habit, even through manual inspection. Future improvements to the IC-PCA should identify such difficult crystals automatically so that they can be subsequently classified manually. Indeed, several attempts to this end were made. For example, we studied the Bayesian probabilities for correctly and incorrectly classified crystals, the distances to nearest neighbors, Bayesian and the nearest neighbor-classifier reporting different habit, or whether all nearest neighbors have the same habit. Unfortunately, none of these methods worked satisfactorily. Thus, presently, the IC-PCA classification has to be taken as it is, it cannot be easily improved on by manually classifying the difficult cases.

[43] Habit distributions of ice crystals are crucial in estimations of, for instance, the radiative impact of cirrus clouds. Therefore, any uncertainties in the habit distributions propagate into the radiative flux computations. This was demonstrated for an arctic ice cloud by carrying out simplistic example computations of radiative properties (presented in the Appendix) for both manually and automatically derived habit distributions of the same image data. Shortwave flux differences between these two cases turned out to be notable but, interestingly, two manual classifications of the same image data resulted in even larger differences, for example 21% for the diffuse upward flux at the top of the atmosphere. According to Vogelman and Ackerman [1995], flux differences exceeding 5% have significant impacts in climate considerations. Based on this, it would seem that even manual classification is not sufficiently accurate for cloud radiative effect simulations. However, such conclusion would be hasty. For example, in our simulations, only short periods of in situ data were used to derive the habit distributions which were then assumed to apply throughout the entire cloud. They also did not account for the small ice crystals. In real ice clouds, shape-size distributions vary considerably both vertically and horizontally, and sizes and shapes are not independent. So, our simulations are unlikely to accurately and reliably quantify the radiative flux errors. Rather, the important conclusion is that the errors resulting from the IC-PCA classification do not seem to exceed the uncertainties inherent to the manual classification, so there does not appear to be any reason to classify habits manually.

[44] Ultimately, the radiative properties of ice crystals depend on the three-dimensional shape of the crystal rather than the two-dimensional shape of a silhouette. Even though these two are naturally connected, the classification of silhouettes cannot produce a 100% accurate shape classification for three-dimensional objects. An interesting application of the IC-PCA would be to test whether the silhouettes of random three-dimensional crystal models used to compute the single-scattering properties have statistical properties similar to those of observed ice crystals. In the future, it may be possible to measure the three-dimensional shapes directly and use these shapes in radiation computations.

Appendix A:: Effect of Classification Uncertainty on Radiative Fluxes

[45] To demonstrate the impact of ice crystal classification uncertainty on the radiative properties of the ice clouds, we conducted a simplistic sensitivity study with a radiative transfer model. The exercise focused on the arctic test case with the lowest classification accuracy. The radiative properties computed with the IC-PCA habit distribution were compared against those computed with the two manually derived habit distributions which were based on the same CPI data, but where different habits were assigned for some crystals that could not be unambiguously identified. These are referred to as manual I and manual II.

[46] The distributions were first combined with libraries of habit-dependent single-scattering properties, obtained from existing databases: plates, (solid) columns, rosettes (with 6 bullets), and (smooth) column aggregates, fromYang et al. [2000], and plate aggregates from Um and McFarquhar [2009]. Since the scattering properties for bullets, rosette aggregates, and irregular crystals are not included in these databases, the (solid) column database was used as the closest approximation to bullets and the 6-bullet rosette database for rosette aggregates. Irregular crystals were excluded because their scattering properties are not well known. The asymmetry parametergand area ratio AR, which denotes the ratio of the particle cross-sectional area to the area of a circle with a diameter equal to the maximum dimension of the crystal [McFarquhar and Heymsfield, 1996], were extracted from these databases. The single-scattering albedoϖ was also available. However, to a good approximation for the visible wavelengths, ϖ = 1. The mean values for gand AR were calculated for the size-shape distributions of the IC-PCA, manual I, and manual II classifications of the arctic test case.

[47] The mid-visible radiative fluxes at the surface and at the top of the atmosphere (TOA) were then computed using the libRadtran radiative transfer package [Mayer and Kylling, 2005] with the DIScrete Ordinate Radiative Transfer model (DISORT) by Stamnes et al. [1988] as the radiative transfer solver. The flux computations were conducted at a wavelength band between 500 nm and 600 nm. The U.S. standard atmosphere [Anderson et al., 1986] was used in each simulation to account for the molecular scattering and absorption. The day of the year, which determines the incoming solar radiation, was chosen to be 278, the surface albedo was set to 0.2, and the solar elevation angle was varied from 0° to 90°. The clouds, described by vertical profiles of ensemble-averagedg, ϖ, and layer values of optical thickness, were assumed to extend from 6 km to 10 km and to be both vertically and horizontally homogeneous. The cloud optical thickness τ depends on the number concentration of the ice crystals as well as the extinction cross section Cext of each crystal shape and size. In these simulations, τ was fixed equal to 4, 8, and 14, and these were multiplied by the obtained area ratios. This way the optical thickness depends on the shape but not on the size or the concentration of the crystals. After multiplying by the area ratios, the optical thickness of the test clouds varied from τ = 0.7 to τ = 3.4.

[48] Figure A1 shows the shortwave downward fluxes (both direct and diffuse) at the surface and the diffuse upward flux at the top of the atmosphere (TOA) for the smallest and largest τconsidered. From the comparison, clear differences in the fluxes can be seen, and the flux results based on the IC-PCA lie between those of the manually classified cases. For instance, for the direct and diffuse fluxes at the surface, uncertainties around 10 Wm−2 were found to be connected to the classification accuracy. For the diffuse upward flux at the TOA, the corresponding uncertainties were between −3 and +7 Wm−2. The relative differences between the two manual classifications were notable: for instance, the diffuse upward flux at the TOA was, at highest, 47 Wm−2 for manual I and 57 Wm−2 for manual II, showing a 21% difference.

Figure A1.

The top row shows the SW direct downward flux, the middle row the SW diffuse downward flux at the surface, and bottom row presents the diffuse upward flux at the top of the atmosphere (TOA) as a function of the solar elevation angle (in degrees). The unit of the fluxes is Wm−2. The optical thickness of the clouds increases in panels from left to right: the variation of τbetween Manual I, IC-PCA, and Manual II is denoted on the top.


[49] The authors wish to thank Antti Penttilä, Petri Koistinen, Jari Valkonen, and Inka Juntheikki-Palovaara for their valuable contributions at the early stages of this work. Andreas Macke and two anonymous referees are acknowledged for their suggestions for improving the manuscript. The work was partially funded by the Academy of Finland (contracts 125180 and 127461) and partly supported by the Office of Biological and Environmental Research (BER) of the U.S. Department of Energy (DE-FG02-02ER63337, DE-FG02-07ER64378, DE-FG02-09ER64770, and DE-SC0001279) as part of the Atmospheric Systems Research and Atmospheric Radiation Measurement (ARM) Airborne Facilities. Data were obtained from the ARM program archive, sponsored by the U.S. DOE, Office of Science, BER, Environmental Sciences Division.