Journal of Geophysical Research: Atmospheres

Convective cloud identification and classification in daytime satellite imagery using standard deviation limited adaptive clustering

Authors


Abstract

[1] This paper describes a statistical clustering approach toward the classification of cloud types within meteorological satellite imagery, specifically, visible and infrared data. The method is based on the Standard Deviation Limited Adaptive Clustering (SDLAC) procedure, which has been used to classify a variety of features within both polar orbiting and geostationary imagery, including land cover, volcanic ash, dust, and clouds of various types. In this study, the focus is on classifying cumulus clouds of various types (e.g., “fair weather, ”towering, and newly glaciated cumulus, in addition to cumulonimbus). The SDLAC algorithm is demonstrated by showing examples using Geostationary Operational Environmental Satellite (GOES) 12, Meteosat Second Generation's (MSG) Spinning Enhanced Visible and Infrared Imager (SEVIRI), and the Moderate Resolution Infrared Spectrometer (MODIS). Results indicate that the method performs well, classifying cumulus similarly between MODIS, SEVIRI, and GOES, despite the obvious channel and resolution differences between these three sensors. The SDLAC methodology has been used in several research activities related to convective weather forecasting, which offers some proof of concept for its value.

1. Introduction

[2] The main motivation for this paper is the development, evaluation and application of a “cumulus cloud mask” (CCM) as a means to isolate only cumulus clouds in a given satellite scene. The approach used to develop the CCM is a very flexible algorithm, and this is only one potential application (with others to be discussed near the end of this paper). For this study, Geostationary Environmental Operational Satellite (GOES; GOES 12 in particular), the Moderate Resolution Infrared Spectrometer (MODIS), and Meteosat Second Generation's (MSG) Spinning Enhanced Visible and Infrared Imager (SEVIRI) instruments are used. This paper investigates new techniques for detection of features using data from these three meteorological satellites via an unsupervised clustering approach. The identification procedure is performed through the use of statistical clustering which invokes an expert labeling of defined “clusters” of pixels with similar properties in the visible (VIS) and infrared (IR) spectrum. Clusters that are identified as cumulus clouds in various stages of growth are labeled as such across multiple satellite scenes. Once trained, the algorithm is automated to highlight only cumulus across any satellite image the method is applied to. In the present study, GOES 12, MSG SEVIRI, and MODIS imagery for similar scenes are presented to show the relative portability of this method to various satellite platforms.

[3] As proof of functionality, the CCM has been tested for several years as part of the Satellite Convection Analysis and Tracking (SATCAST) System. SATCAST currently provides 0- to 1-h forecasts of convective initiation (CI) using Imager data from GOES satellites [Mecikalski and Bedka, 2006]. SATCAST relies on determining the location of cumulus in varying stages of growth, from “fair weather” to “towering” (e.g., cumulus humulus). Once these clouds have been identified (among the many other cloud types within a visible satellite image), focused monitoring of the Lagrangian evolution of 1-km-resolution cumulus can be done as they grow, glaciate, precipitate, and produce lightning. Cumulus cloud tracking is accomplished using the Bedka and Mecikalski [2005] methodology for producing very dense, “mesoscale” atmospheric motion vectors (AMVs). The Bedka and Mecikalski [2005] AMV procedure follows from the more traditional “cloud motion” winds work of Velden et al. [1997, 1998] and Dunion and Velden [2002].

[4] Several aspects of the GOES data stream are used within SATCAST to form satellite infrared–CI relationships: cloud growth rates via IR cloud top temperature (TB) changes ∂(10.7 μm TB)/∂t, and glaciation via 10.7 μm TB, 13.3–10.7 μm, and ∂(13.3–10.7 μm)/∂t [Mecikalski and Bedka, 2006]. Currently, the SATCAST procedure is being tested within several systems that monitor for CI on 0- to 2-h timescales as part of the Advanced Satellite Aviation Weather (ASAP) initiative [Mecikalski et al., 2007]. These include the “AutoNowcaster” [Mueller et al., 2003] and the deterministic 0- to 2-h Tactical Convective Weather Forecast (TCWF) as part of the FAA's Corridor Integrated Weather System (CIWS) currently operated by MIT Lincoln Laboratory in a concept exploration mode [Evans et al., 2004]. The CIWS system is used by air traffic personnel to improve traffic flow on jet routes, and to provide proactive reroutes when convective weather impacts their capacity [Wolfson et al., 2004]. SATCAST is also being developed for use with the Meteosat Second Generation (MSG) satellite. Again, proof of concept for the CCM, as presented herein, partly comes from the use of CI nowcasts by the SATCAST algorithm within existing, sophisticated systems, as well as by various end users including regional National Weather Service Forecast Offices (NWSFO). We refer the reader to the works Mecikalski and Bedka [2006] and Mecikalski et al. [2007] for examples of SATCAST. Validation of SATCAST has recently been performed by Mecikalski et al. [2008].

[5] The clustering algorithm developed here is tested on the VIS and IR channels on GOES (4 IR channels), MODIS (18 IR channels of the possible 36 available) and MSG (9 IR channels) data as a means of comparing how this method may be applied to real-time imagery.

[6] This paper proceeds as follows: Section 2 provides background into classification methods designed for use with satellite-based data. Section 3 describes the data used in this study, while section 4 outlines the methodology. Examples are presented in section 5 and the study is discussed and concluded in section 6.

2. Background

[7] Using aircraft and satellite observations and numerical cloud models, a number of studies have examined various aspects of cloud field structure [Plank, 1969; Welch et al., 1988; Cahalan and Joseph, 1989; Ramirez and Bras, 1990; Sengupta et al., 1990; Joseph and Cahalan, 1990; Weger et al., 1992; Zhu et al., 1992; Kuo et al., 1993; Nair et al., 1998; Han and Ellingson, 1999; Nair et al., 2001]. Identification of a particular cloud type (or types within a given class of clouds—stratus, cumulus, cirrus) has many potential applications. One, as noted above, is in aviation systems that attempt to forecast for hazards related to thunderstorms. These include the AutoNowcaster and CIWS to name a couple. The hazards include turbulence, strong wind shear, reduced visibility and heavy precipitation. Other likely areas in which delineation of atmospherics is important include the identification of aerosols and volcanic ash clouds, given that aerosols and ash come in a variety of forms and compositions. Aerosols have proven effects on human and plant health [Pope et al., 1995], while volcanic ash is an aviation safety hazard [Casadevall, 1993].

[8] There are other areas that would benefit from improved understanding of boundary layer cumulus cloud field structure, such as parameterization of shallow convection in general circulation models (GCMs). Since shallow convection is strongly influenced by surface characteristics, areas of significant land use change are expected to show indications of local climate modification [Nair et al., 2000]. Statistical descriptions of the temporal variability of cloud field properties, useful for parameterization in larger-scale models, are not currently available [Joseph and Cahalan, 1990; Nair et al., 1998; Han and Ellingson, 1999]. Studies suggest microphysical properties of cloud fields may be linked to atmospheric dynamic and thermodynamic structure [Sykes and Henn, 1989; Weckwerth et al., 1997; Rosenfeld et al., 2008]. Knowledge of such relationships would be useful for parameterization of cloud field characteristics in large-scale models, as well as for short-term (0–6 h) weather forecasting. Satellite data are ideal to further our understanding of the issues related to boundary layer cumulus cloud field characteristics. In particular, Landsat, ASTER and Ikonos imagery can be used to resolve high-resolution features (30–100 m), data from the MODIS have capabilities to resolve features from (250 m to 1 km), and geostationary data can resolve cloud properties at 1–4 km with the added value of measuring their temporal variations. Note that the analysis of large volumes of data is necessary to develop good statistical descriptions of cloud field structure. This requires the use of automated methods for identifying cumulus cloud fields in satellite imagery.

[9] Supervised classification techniques utilizing neural networks, maximum likelihood, and other algorithms have proven successful in cloud detection and classification [Baum et al., 1997; Berendes et al., 1999]. Traditional supervised techniques require the construction of a large database of training samples that must be selected from imagery and correctly labeled by an expert analyst. Accurate and consistent labeling of training samples can be a tedious, time-consuming, and error-prone process. As a basis for improved efficiency, studies that capitalize on the textural and spectral signatures of clouds have been used to classify cloud high-resolution Landsat multispectral imagery [see, e.g., Chen et al., 1989; Wielicki and Welch, 1986; Welch and Wielicki, 1989]. However, the classification schemes presented in these studies suffer from the limitation that they are trained for a specific environment, such as for maritime [Bankert, 1994], tropical [Shenk et al., 1976; Inoue, 1987], or polar [e.g., Ebert,1987, 1989; Key and Barry, 1989; Key, 1990; Welch et al., 1989, 1992; Rabindra et al., 1992; Tovinkere et al., 1993] regions. Peak and Tag [1992] developed a method of cloud classification based on a method using a cloud classification database, image segmentation and neural network identification. The Peak and Tag [1992] approach uses VIS and IR data from GOES and the advanced very high resolution radiometer (AVHRR) satellites. Tag et al. [2000] described an automated cloud classification system using AVHRR data for use in the interpretation of synoptic-scale events as an aid in forecasting. Their approach is based on a 1-nearest neighbor classifier, in which comparisons are made to a cloud type database selected by human experts, henceforth comprising a supervised classification system. A method similar to the Tag et al. [2000] approach has been implemented on real-time GOES imagery (see http://www.nrlmry.navy.mil).

[10] Unsupervised classification techniques utilize various distance or similarity metrics in order to segment imagery into spatially or spectrally similar groups or clusters. Cluster membership statistics are computed from sample imagery and clusters are determined on the basis of a cluster distance or similarity metric. Resultant clusters are initially unlabeled and must be identified and labeled by an expert.

[11] The Standard Deviation Limited Adaptive Clustering (SDLAC) technique is an iterative, statistically based method that is similar to the ISODATA clustering algorithm [see Gonzalez and Wintz, 1977]. However, unlike ISODATA, SDLAC does not require initial guesses for cluster centers. The SDLAC algorithm implements an iterative standard deviation threshold adjustment along with several iterative adjustments to the cluster centers. In essence, new clusters are created during each iteration of the algorithm as the cluster centers and thresholds are refined. The SDLAC method has been applied successfully for detection and classification purposes. A convective cloud mask has been created using GOES data. SDLAC has been used on MODIS to detect volcanic ash plumes and to identify tropopause penetration from deep convective updrafts (also known as “overshooting tops”).

[12] The physical information one can glean from the clustering algorithm is the composition of the cluster centers themselves. For example, the “cumulus cloud” class is defined by a set of clusters, with each cluster consisting of a mean and standard deviation for each channel (or feature, i.e., channel difference, ratio, texture, etc.). Upon decomposition of a cluster, the range of each individual feature is determined. One particular “cumulus” cluster may have a high GOES channel 1 (visible data) variability, with a medium standard deviation and a simultaneously large channel 1 contrast value and a large standard deviation. This type of information could be interpreted physically, yet in practice it may be difficult to know which channels are most important in defining a particular class.

[13] One of the properties of SDLAC is that several clusters may be formed for the same class. Sometimes the defining difference between clusters is produced by variations in a particular feature not of interest for the given application. For instance, in the CCM, there is no need to differentiate between land, water, and other background classes based upon reflectance since we are only concerned with clouds; however, since reflectance is a feature, extraneous clusters are defined because of variations of reflectance of which the land surface plays a part. Fortunately, extraneous clusters can either be merged together into meaningful classes or removed during the cluster labeling process.

[14] The SDLAC clustering algorithm is an unsupervised classification technique that groups statistically similar pixels in the data. Our implementation of the algorithm is not designed to determine which features are most important (physically) for a given class, but simply finds the best fit for each cluster on the basis of the clustering algorithm. It is important to note that other algorithms are much better suited for direct physical interpretations, such as ontologies and decision trees, where the classification is defined physically from the start.

[15] Feature selection is one way to algorithmically determine important features for defining specific classes. This method provides insight (thresholds, ranges, etc.) into the effect of specific features on the classification accuracy. In order to do feature selection, however, one requires labeled training data, something that is also a requirement for a supervised technique. Clustering algorithms can be used to effectively “bootstrap” training data, yet that is beyond the scope of what we were trying to accomplish in this study. The strength of the clustering algorithm is its automatic nature that does not require a preconceived definition of each class. Clustering allows pixels to be grouped “naturally” according to the input features used. Clusters can then be grouped together into logical classes, which is much easier than selecting training samples for individual classes as required for supervised learning techniques.

3. Data

[16] The data used in this study from GOES 12 are one VIS channel (central wavelength 0.65 μm) and four IR channels (3.9, 6.5, 10.7, and 13.3 μm). All GOES 12 infrared data are linearly interpolated to 1-km resolution (away from the native 4-km IR resolution) so all spectral channels are the same spatial resolution for processing. On SEVIRI, all 12 channels are used, with one broadband VIS channel (central wavelength 0.7 μm) and 8 IR and 3 visible channels (0.6, 0.8, 1.6, 3.9, 6.2, 7.3, 8.7, 9.7, 10.8, 12.0, and 13.4 μm). The SEVIRI broadband visible channel is at the native 1-km resolution, and all other channels are replicated to 1 km resolution. From MODIS, the 1-km-resolution channels used consist of the following wavelengths: 0.64, 1.37 3.75, 3.96, 6.72, 7.3, 8.5, 9.7, 11.0, 12.0, and 13.3 μm. The MODIS channels were chosen to be similar to the channels available in SEVIRI. The use of all 1-km data allows us to compare clustering, cloud identification results without resolution-related issues.

[17] In addition to the spectral channels, a new feature was created by computing an 11 × 11 neighborhood grey level difference vector (GLDV) based spatial texture for each pixel in the high-resolution visible channel of each instrument. Grey level differences are computed over the 11 × 11 pixel grid using a distance of 1 pixel with angles of 0°, 45°, and 90°, producing a gray level difference vector containing the counts of each gray level difference normalized by the number of differences computed. The “contrast” measure is defined as

equation image

where n is the number of gray level differences. The GLDV contrast feature is useful for finding cumulus and stratocumulus clouds [Welch et al., 1988].

[18] The training phase of the classifiers utilizes imagery that contains clouds of various types with emphasis on convective clouds. The training data sets for the classifiers consist of daytime imagery from over 450 GOES 12 scenes, 42 MODIS scenes, and 75 MSG scenes. The MODIS classifier was compared to coincident GOES 12 and SEVIRI images as a means of demonstrating that similar cloud identification can be obtained from two different sensors with different spectral channels.

4. Methodology

[19] Image clustering using SDLAC begins with the selection and computation of image features. Features are imager data specific, but may include imager channels, channel differences and ratios, vegetation indices, textures, and other derived features. Features are normalized to a fixed range (0–255) on the basis of either a computed or specified physical range.

[20] SDLAC clustering of a satellite scene is performed on a pixel-by-pixel basis. First, a subset of the pixels in the scene is selected for processing. The first pixel processed forms the seed for the first cluster. As additional pixels are processed, they are tentatively added to each existing cluster. Tentative feature means μ*k(q) and standard deviations σ*k(q) are computed for each cluster k according to

equation image

where k is the cluster, nk is the number of pixels in cluster k, q is the feature (i.e., channel), and Xq(i*,j*) is the normalized value of feature q at pixel (i*,j*). The tentative feature means and standard deviations are used as a basis for constructing clusters and determining cluster membership.

[21] The SDLAC method does not require any initial guesses at the number and position of the image clusters. Instead of initial cluster guesses, initial standard deviation thresholds θ(q) are specified for each feature. The standard deviation thresholds are arbitrary numbers between 0 and 255. Large θ(q) values produce fewer clusters with larger standard deviations while smaller θ(q) values produce more numerous, tighter clusters. Typically, smaller θ(q) values are chosen initially, and during the iterative clustering algorithm the θ(q) values are gradually increased until a stable cluster set is formed.

[22] The standard deviation limiting function, Θk(q), one of the two metrics [Levine and Shaheen, 1981] used to determine cluster membership, is defined by

equation image

where

equation image

Also, C(q) is a scale factor chosen to insure the ratio in Sk(q) is <1, and θ(q) is an adapted standard deviation threshold.

[23] We compute Θk(q) by scaling the standard deviation thresholds, θ(q), by a standard deviation limiting scale factor Sk(q). As the standard deviation gets larger, Sk(q) decreases the θ(q) values, preventing larger clusters from absorbing too many pixels. The decrease of θ(q) has the effect of starting new nearby clusters which effectively splits larger clusters. Conversely, when the standard deviation is low, the smaller cluster will retain most the θ(q) value and new clusters will not form as easily.

[24] The change in cluster means, Δμk(q), is the other clustering metric used in conjunction with Θk(q). It is defined by

equation image

A single clustering iteration is accomplished by processing all of the selected pixels in a scene using the following algorithm.

[25] 1. For each selected pixel Xq(i*,j*), compute tentative feature means μ*k(q) and standard deviations μ*k(q).

[26] 2. Compute standard deviation limiting function, Θk(q).

[27] 3. Compute change in cluster means, Δμk(q).

[28] 4. Determine cluster membership as follows: (1) If Δμk(q) ≤ Θk(q) for all features q, only in cluster k, merge pixel with cluster k and update cluster statistics: Δμk(q) = μ*k(q) and σk(q) = μ*k(q). (2) If more than one cluster satisfies case 1, merge to cluster, producing minimum Σ[Δμk(q)]2 and update cluster statistics as in case 1. (3) If no cluster satisfies case 1 or 2, create a new cluster (seed).

[29] Steps 1–4 are repeated until all pixels are processed.

[30] After a single pass of the clustering algorithm, we are left with a set of clusters based upon our arbitrary initial θ(q) values. Since the SDLAC algorithm does not initially have knowledge of the actual distribution of the clusters in the image, we must devise a method of determining optimal clustering performance.

[31] Like most clustering techniques, SDLAC is an iterative algorithm. The stability of the clusters is evaluated by counting the number of pixels that have changed cluster membership since the previous iteration.

[32] At the end of each iteration, the following procedure is then applied: (1) For each cluster update, feature means Δμk(q) and standard deviations σk(q) using only pixels are added to the clusters during the current iteration; the updated clusters are used as seeds for the next iteration. (2) Small clusters containing less than a specified number of pixels are removed. (3) Clusters with standard deviations of zero in several features usually represent image noise or data errors and are removed. (4) The standard deviation thresholds, θ(q), are incremented to aid convergence. (5) If the percentage of pixels that change cluster membership since the previous iteration is less than a specified amount, the image has been clustered successfully and iterations are stopped. (6) Otherwise, continue iterations until the image is successfully clustered or fails after the number of iterations exceeds a specified value.

[33] In practice, it is usually possible to achieve a successful clustering of an image by adjusting initial θ(q) and increment values. The initial θ(q) and increment values are data specific, and must be adjusted by experimentation to achieve a good clustering. If the initial values are too low, extraneous clusters are formed which may slow down the convergence process. If the initial values are too high, small but important clusters may be merged with larger clusters and class resolution may be lost. Since the values θ(q) can be incremented during iteration, it is very effective to select slightly lower initial values and allow the algorithm to find the optimal value during the iteration process.

[34] The final step in the creation of a cluster-based classifier is the “blending” of the clusters from individual images into a single combined cluster set. By considering clusters as two separate normal distributions, the mean and standard deviation update equations for the merging of cluster l into cluster k can be written as

equation image

[35] The new mean and standard deviation update equations are applied and blending of the clusters proceeds using the same iterative algorithm that is applied to cluster the pixels in individual scenes. The resulting cluster set represents all of the scenes of the training image set and provides the basis for the creation of a nonscene specific classifier.

[36] After the clustering is completed, clusters must be identified and labeled. Labeling of the clusters is done by a human expert, using the Interactive Visualizer and Image Classifier for Satellites (IVICS) visualization software [Berendes et al., 2001]. Physical classes of interest are identified in the imagery and each cluster is labeled as one of those class types. Many clusters may be combined into a single physical class, and unneeded clusters may remain unlabeled or may be removed entirely.

[37] Finally, the labeled cluster set is used to classify imagery. Image classification is accomplished by computing the mean and standard deviation update equation (1) and applying the cluster membership functions (2) and (3) using the labeled cluster set as initial clusters. Each individual pixel in the image is assigned the class label of the cluster to which it would merge. The standard deviation limiting function, Θk(q), can either be applied or ignored. If Θk(q) is applied, the pixel can be labeled as “unknown” if it would form a new cluster based upon the logic in step 4 of the clustering algorithm. Conversely, if Θk(q) is ignored, pixels are simply labeled by the cluster which minimizes Σ[Δμk(q)]2, ensuring that every pixel is labeled as a class.

[38] It has been shown that spatial textures produce valuable features for cloud detection and classification [Sengupta et al., 1990]. The gray level difference vector (GLDV) method is a computationally efficient and effective method that has been used successfully in cloud classification algorithms [Chen et al., 1989]. The GLDV “contrast” measure applied to the GOES visible channel enhances small-scale spatial variations of reflectance that are indicative of cumulus clouds. Therefore, the GLDV contrast of the visible GOES 12, MODIS and MSG channel is used as an additional clustering feature. This textural derived “channel” has proven to be a critical component for identifying daytime cumulus cloud types.

[39] The iterative SDLAC algorithm can be tuned to produce very stable clusters using a wide variety of satellite data types and can be utilized as a tool for cloud identification (e.g., cumulus identification for convection initiation purposes). Applications of the SDLAC algorithm are discussed in subsequent sections.

[40] There are limits of applicability that are dependent upon the representative imagery used in the training process. For example, if you used only polar imagery to create and label the clusters, the classifier would not be applicable to desert imagery simply because there would be no clusters of the desert classes represented in the polar clusters. The SDLAC algorithm can be applied with a very specific focus (i.e., trained for a specific geographic region, season, or classes) or it can be applied in a very general way (i.e., trained using a broad set of imagery representing multiple location, seasons, etc.)

5. Examples

[41] The SDLAC clustering algorithm for GOES 12, MODIS and SEVIRI sensors is applied to produce a convective cloud mask for each data type. The clustering algorithm was applied to a set of images from each sensor and the resulting clusters were labeled as belonging to seven different classes. Figure 1 shows a list of the classes along with associated colors that will be used in subsequent figures. The classes were defined on the basis of visually identifiable features in the image. Note that the main goal was to identify convective clouds (“towering cumulus” and “cumulus”) in various stages for use in convective nowcasting systems. Toward that end, minimal effort was made to distinguish cloud types beyond convective types and they have been grouped into general categories such as “ice cloud” and “nonconvective water cloud. ”Two additional classes, “glaciated mature convection” and “overshooting convective tops, ”are also being examined for possible application in turbulence studies.

Figure 1.

Classes defined by SDLAC convective cloud mask.

[42] Figure 2a shows an example GOES channel 1 visible image over Lake Michigan and the resulting color-coded SDLAC image can be seen in Figure 2b. The primary function of this mask (i.e., the detection of convective clouds) is represented by the dark blue and cyan areas in Figure 2b. Visual comparison of Figures 2a and 2b shows that the convective classes (“cumulus” and “towering cumulus”) are well detected. Large pink areas of ice cloud can also be seen in Figure 2. The ice cloud class represents primarily diffuse (visually fuzzy) ice clouds that are warmer than glaciated convective tops and produce a low GLDV contrast value. No glaciated mature convection or overshooting tops are visible in the scene.

Figure 2.

(a) GOES 12 visible image centered over Lake Michigan taken at 2100 UTC 3 August 2003 and (b) color-coded SDLAC cloud mask (refer to Figure 1 for color code labels).

[43] An MSG SEVIRI example over northern Italy is shown in Figure 3 with channel 1 shown in Figure 3a and the SDLAC results in Figure 3b. Visual examination shows that similarly to the GOES case, the MSG SDLAC mask detects convective clouds well. Additionally, the MSG example contains area labeled “glaciated mature convection” which represent visually diffuse ice clouds at very low temperatures above a large convective cloud. Some of the “bumpy” texture of the cumulus cloud can be seen through the ice cloud because of gravity waves from the convective updraft penetrating the tropopause. The additional channels in the MSG SEVIRI sensor may be providing more information about cloud phase to help make a better distinction than the GOES version.

Figure 3.

(a) MSG SEVERI image centered over northern Italy taken at 1310 UTC 25 June 2006 and (b) SDLAC convective cloud mask (refer to Figure 1 for color code labels).

[44] MODIS allows for an examination into the benefits into increased spectral channels, and 1-km infrared spatial resolution. Figure 4a shows an image over Kansas along with the corresponding SDLAC mask results in Figure 4b. As in the MSG and GOES cases, the smaller convective clouds shown in blue in Figure 4b are well represented. Large areas of “nonconvective water cloud, ”in this case stratocumulus, are shown in dark yellow. The stratocumulus is warmer with lower GLDV texture. The mature convective areas are present in the anvil of the large thunderstorm. The upper right corner of Figure 4b shows a red area of “overshooting mature convective tops,” characterized by extremely low temperature with high GLDV textures and generally high channel 1 (0.6 μm) reflectance. The additional channel information present in MODIS data may aid in the detection of overshooting tops.

Figure 4.

(a) MODIS image centered over northern Kansas taken at 1815 UTC 4 May 2003 and (b) SDLAC convective cloud mask (refer to Figure 1 for color code labels).

[45] The tropopause penetrations within deep convection, that is, “overshooting mature convective tops” are also detectable in GOES and MSG data using the SDLAC method. A version of the SDLAC clustering mask was produced specifically for the purpose of detecting overshooting tops in GOES imagery. Figure 5 shows an example of GOES imagery with active convection over Illinois. Figure 5a shows a three-band color composite of the active system while Figure 5b shows the same area in the visible channel. Areas of overshooting tops are very cold and have a rougher textured appearance due to the convection protruding above the cirrus anvil. Figure 5c shows a three-band color composite with channel 1 in green, channel 2 inverted in blue, and the GLDV texture of channel 1 in red. Using this color scheme, the areas of high GLDV texture are clearly visible as red and yellow areas. The highly textured areas correspond to the overshooting tops and smaller convective clouds. Figure 5d shows the overshooting tops detected by the SDLAC algorithm in red. Clearly, the overshooting tops correspond well with the areas of high channel 1 GLDV texture, but the smaller convective clouds are eliminated, probably on the basis of temperature. Accurate detection of overshooting tops allows for identification of strong convection, and can be used as a tool for identification for the potential for aircraft turbulence. This has a strong application for aviation safety [see Mecikalski et al., 2007]. There appears to be some areas shown which may not be overshooting tops, however maybe gravity waves or other areas that may be slightly penetrating the tropopause. It is important to note that the areas highlight will be highly turbulent and thus still important to aviation safety.

Figure 5.

Example of an “overshooting top” mask for active convection. Data are from GOES 12: (a) a three-band enhanced color composite with channel 4 inverted in red, channel 3 in green, and channel 1 in blue; (b) an enhanced channel 1 visible image; (c) a three-band enhanced color composite with GLDV texture of channel 1 in red, channel 1 in green, and channel 2 inverted in blue; and (d) the “overshooting top” mask with red indicating tops. Data are from 2315 UTC 10 May 2003 over Illinois.

6. Conclusions

[46] SDLAC is a cluster-based cloud classification technique that is currently being used within an operational framework. While it is difficult to validate a “cloud mask” or other cloud classifications because of the subjective nature of cloud classification and the lack of “truth” data, the efficacy of the SDLAC convective cloud mask can be seen by subjective examination of the imagery. Additionally, the operational results of the SATCAST algorithm objectively show that the SDLAC convective cloud mask is accurately detecting the convectively active clouds [Mecikalski et al., 2008], which was the primary design objective. The general nature of the SDLAC clustering algorithm produces a classification technique that is not satellite platform dependent. As satellite technology improves (as in the case of GOES R [Schmit et al., 2005]), this method allows easy addition of the new spectral channels for possible improvement to cloud classification.

[47] Using many cases within different seasons and different times, the statistical information of each cluster becomes robust. In our examples with GOES and MSG, we used sufficient cases over various seasons and viewing angles, reducing dependencies on time of the year and location. This enhances the utility of the SDLAC method as a tool for cloud identification within automated systems such as SATCAST. The SDLAC algorithm has been successfully applied as a detection algorithm for overshooting convective cloud tops. This detection capability has a potential aviation safety application for turbulence detection.

Acknowledgments

[48] This research was supported by National Aeronautical Space Administration grants NAS5-31718 and NAS1-98131. Additional funding has been provided by NASA grant NNX07AFF14G. The GOES 12 data were acquired by GOES ground station at Global Hydrology and Climate Center (GHCC), Huntsville, Alabama. The MSG data were kindly provided by the National Environmental Satellite, Data, and Information Service (NESDIS) center in Camp Spring, Maryland.

Ancillary