We introduce a technique for assessing the diurnal development of convective storm systems based on outgoing longwave radiation fields. Using the size distribution of the storms measured from a series of images, we generate an array in the length scale-time domain based on the standard score statistic. It demonstrates succinctly the size evolution of storms as well as the dissipation kinematics. It also provides evidence related to the temperature evolution of the cloud tops. We apply this approach to a test case comparing observations made by the Geostationary Earth Radiation Budget instrument to output from the Met Office Unified Model run at two resolutions. The 12 km resolution model produces peak convective activity on all length scales significantly earlier in the day than shown by the observations and no evidence for storms growing in size. The 4 km resolution model shows realistic timing and growth evolution, although the dissipation mechanism still differs from the observed data.
 Deep convection constitutes the fundamental building block of tropical weather, producing cloud systems that organize and interact on many spatial scales. These span the range from individual convection cells, through clusters of intense thunderstorms embedded in mesoscale systems and disturbances such as African Easterly Waves, to superclusters related to global circulation patterns such as the Madden Julian Oscillation [Leary and Houze, 1979; Machado et al., 1993]. Similarly, a large range of temporal scales are involved from the diurnal cycle through to interannual modes of variability.
 The upscaling of energy from the smaller and high-frequency scales to the larger and low-frequency regimes is still poorly understood. The Cascade project (Woolnough, manuscript in preparation, 2010) is designed to address this topic by running high-resolution cloud system resolving models (1.5 km) over large domains (∼4000 km) using the Met Office Unified Model (UM). The intention is to capture both the small-scale convection physics and large-scale circulation patterns in the same simulation. As one element, we are examining the diurnal development of convective systems over Africa and we shall address this in detail in a forthcoming paper. We present here a method developed to test the diurnal cycle as represented in numerical models against observation.
 The fundamental starting point for representation of the diurnal behavior of tropical convective systems is simply to plot cloud fraction or some other suitable proxy variable as a function of time of day. Such an approach, as in the early study of satellite infrared measurements by Gruber  for example, rapidly reveals the tendency for peak activity to occur in the early evening. Repeating this for multiple locations which can be overplotted is then trivial and additionally begins to reveal spatial information. A more sophisticated form of this idea, in the era of high-resolution satellite imagery, is to break a scene down to the pixel level and separately Fourier analyze each in time as in Yang and Slingo . These authors compared the results of a simulation run using the UM with the Cloud Archive User Service (CLAUS) data set [Hodges et al., 2000]. Their model was run for a simulated year for all longitudes in the 30°S–30°N latitude band with a resolution of 3.75° in longitude and 2.5° in latitude. They conducted their comparative analysis through two variables acting as proxies for convective activity: window channel brightness temperature and precipitation rate. At this resolution, these are likely detecting mesoscale convective systems or larger aggregations. For both variables, the phase of the diurnal Fourier components indicated that the models produced peak convective activity much earlier in the day than the observations.
 A generalization of the Fourier approach that includes spatial variability is principal component analysis with empirically orthogonal functions as used by Murakami , Lau and Chan , and Comer et al.  inter alia. This latter paper used data from the Geostationary Earth Radiation Budget (GERB) instrument [Harries et al., 2005] to study the diurnal cycle of outgoing longwave radiation (OLR). It showed that 95% of the variation could be explained by the first two components and that the development of cloudiness was strongly influenced by orographic features.
 While such studies yield spatiotemporal information, they do not relate directly to the evolutionary processes in a system's lifecycle. It is possible to tackle this using a tracking method that allows individual storms to be followed from image to image over time [e.g., Williams and Houze, 1987; Futyan and Del Genio, 2007]. This enables all the measurable parameters to be labeled with position and time coordinates and plotted accordingly. While this approach is now quite common, the implementation can be time-consuming and sensitive to the many parameters that are required. In addition, to make sense of such information, the data still need to be reaggregated in some form else we are left with individual case studies of individual systems. The method we set out here is fast and has essentially a single parameter that rapidly produces a representation of the evolution of storm systems.
 The model data being tested was generated using the nonhydrostatic UM version 7.1 in a configuration previously used by Lean et al. [2008, 2009]. In summary, the model solves the atmospheric dynamical equations using a semi-implicit, semi-Lagrangian scheme on a rotated latitude-longitude grid [Davies et al., 2005; Cullen et al., 1997]. It utilizes parameterizations to represent, inter alia, subsurface and surface fluxes [Essery et al., 2001], boundary layer turbulence [Lock et al., 2000], mixed-phase microphysics [Wilson and Ballard, 1999] with multiple hydrometeor components and, in general, convection [Gregory and Rowntree, 1990]. The model was run in a limited area mode forced with constant sea surface temperatures and initiated with analysis fields from the European Centre for Medium-Range Weather Forecasts. These fields were subsequently updated only at the lateral boundaries.
 The observed data for comparison was provided by GERB. This is a broadband radiometer on the Meteosat-8 satellite producing data at a nadir resolution of 50 km. The satellite is positioned on the equator at 3.5°W providing a field of view covering the whole of Africa and stretching to Northern Europe, the Middle East and parts of South America. GERB makes radiation measurements every 6 min enabling it to follow the evolution of medium- to large-scale weather systems. We made use of a hybrid product (NRT V003 ARCH) that additionally includes information from the Spinning Enhanced Visible and InfraRed Imager (SEVIRI), also on Meteosat-8, to produce high-resolution OLR measurements with approximately 10 km resolution. The product is described in Dewitte et al.  where it is termed “Standard High-Resolution Image (SHI)” and it has been used previously to study the effect of Saharan dust on the atmospheric radiation balance [Slingo et al., 2006].
 The test area covered West Africa and used 10 days of model data running from 26 July to 4 August and 17 days of GERB observations from 22 July to 7 August during 2006. The model was run at two resolutions: with 460 by 340 pixels of 0.11° and with 1110 by 776 pixels of 0.036°, corresponding to resolutions at the center of the domain of 12 km and 4 km, respectively. The convection scheme differed between the two models in that the 12 km model scheme used a convectively available potential energy closure method [Gregory and Rowntree, 1990], whereas the 4 km model used a version in which the mass flux at the cloud base was limited. In practice, the 4 km resolution model was tuned such that almost all of the convection was represented explicitly rather than through parameterization. This configuration leads to an improved representation of larger storms at the expense of missing some weak showers [Lean et al., 2008]. The 12 km model included two components in the microphysics scheme (cloud liquid and frozen water) and 38 vertical levels rather than three hydrometeor components (additionally prognostic rain) and 70 vertical levels in the 4 km model. The 12 km model was run with a 300 s time step and the 4 km with a 60 s time step for the first 5 days and a 30 s time step thereafter. In addition to an improvement in the direct representation of atmospheric processes, Hohenegger et al.  showed how moving to an explicit convection, cloud-resolving model resulted in a radical change to the behavior of land-atmosphere feedbacks.
 The GERB data were subsetted and regridded to match the area and resolution of the 12 km model data (approximately 5°S–35°N, 25°W–25°E). The 4 km model was nested inside the 12 km model region and covered a smaller area (approximately 0°N–30°N, 20°W–20°E). These regions are shown in Figure 1. Both the model output and the observed data products were generated at 15 min intervals. As the models were driven only at the boundary by analysis fields, we are not able to match individual observed clouds to the model. Thus, comparisons with observation rely on indirect statistical methods.
 Pixels were identified as cloud on the basis of being below a threshold OLR flux (Fth). An alternative approach would be to use SEVIRI window channel data for the observations and apply a more sophisticated algorithm to identify clouds which could then be compared to internally flagged cloudy regions from the model. However, as we intend to study the model representation of convection, we prefer the approach that the burden should rest with the numerical simulations to produce directly observable quantities. These can then be processed in an identical way for both observations and models. With the observations then acting as “truth,” any differences must rest with some aspect of the modeling. The alternative approach leaves open the possibility that the cloud detection algorithm applied to the observations is the source of the discrepancies.
 It is a common approach to identify cold cloud regions with sites of deep convection. However, studies such as those by Rickenbach  and Rickenbach et al.  indicate that as propagating systems mature, the locations of cold cloud can become separated from those of precipitation and that these cloudy regions can continue to grow despite becoming decoupled from the original convection cells. This can complicate detailed interpretation of the convective behavior. However, understanding the diurnal cycle of the cloud cover itself is of direct relevance to studies of the radiative processes and feedbacks that are important to both weather and climate prediction. In particular, both the sign and magnitude of cloud radiative forcing varies according to the diurnal phase. During the day, the reflection of incoming radiation by cloud tends to dominate over the tendency to trap the OLR emitted by the surface that is the dominant effect at night. Cloud feedbacks are the largest uncertainties in climate sensitivity estimates [Randall et al., 2007].
Mapes and Houze  show the wide range of blackbody temperatures (188–267 K) that could be used to determine a cloudy region. Among these, Fu et al.  use temperature thresholds of 215 K for deep convective clouds and 267 K for convective anvil cloud. Our fluxes are not directly comparable, however, as they are broadband measurements that include absorption features, in contrast to an effective temperature based on narrow window channel measurements. Fu et al.  also quote a value of 240 W m−2 as an OLR threshold often used to diagnose deep convection. We use Fth = 150 W m−2 here initially which corresponds to an effective broadband emission temperature (Tb,e) of 227 K. The OLR flux distributions plotted in Figure 2 show that this conservatively restricts the sample to cold cloud tops, minimizing any possible noncloud contamination. While increasing this threshold increases the range of length scales being sampled, it reduces the ability to identify the coldest regions at small scales arising from deep convection. Contiguous regions of pixels below Fth sharing an edge (i.e., not including diagonal adjacency) were then identified as clusters using the intrinsic IDL routine label_region and the area (A), based on the nominal equatorial resolution, and length scale (L = ) calculated.
 To examine the character of the diurnal signal, the cluster length scale distribution at each time (t) was normalized by calculating the standard score statistic (Z) for the number of systems (N) in each length scale bin
using the measured mean () and measured standard deviation (σ) over time for each length scale bin separately. Taking the Fourier transform of this array in the time direction confirms the presence of a dominant signal on a period of 1 day. Accordingly, while calculations were carried out on the full field, the plotted results are folded on the daily cycle (i.e., diurnally composited).
Equation (1) measures the deviation from the mean number of storms at each size in multiples of the measured standard deviation for that size bin. As such, it is essentially “self-normalized” by the data set and makes no assumption about any underlying spatiotemporal distribution of the data. The values generated from equation (1) for the GERB data are shown in the gray scale image in Figure 3. The length scale bins in this and subsequent figures are logarithmic with edge values given by Li = 22+i/4 km.
 In the following analysis, we proceed on the basis that the data is composed of two dominant components. Fundamentally, we assume that there is an underlying random distribution for the number of clouds in each scene which, being essentially counting statistics, is Poisson. We also assume that there is an additional diurnal signal that results in no net increase in the number of storms when averaged over the daily cycle but has a potentially large variance.
 In order to find regimes where the diurnal forcing is having an impact on the number of systems, the significance of each measured value of N(L,t) can be calculated from a Poisson distribution with the appropriate mean (L) for each length scale bin. The contours for several probability levels are overplotted on Figure 3. These contours indicate the size regimes and time of day where there is significant evidence for the effect of a nonrandom signal in addition to the underlying Poisson distribution. We should note in passing that, in a situation with large and a weak diurnal signal, the Poisson distribution could be approximated as Gaussian with σ = and the probability calculation would reduce to a simpler case based directly on Z.
 The images formed from equation (1) for the three data sets are plotted alongside each other in Figure 4. This figure provides an elegant tool to compare and contrast the diurnal behavior of convection in the different data sets. Similar to the way that a Hovmöller plot demonstrates the trajectory of some anomaly against a position coordinate, these plots represent the evolution in length scale space. The bright areas show, for each data set, the times when there are more storm systems than average at each length scale. The GERB data show a broad upward stripe beginning shortly after midday, fitting with the paradigm of small convective systems forming, growing, and merging during the day. In contrast, the 12 km model data exhibit a flat response across all length scales: showing that storm systems of all sizes switch on and off together. There is a tendency for parameterized schemes to generate deep convection rapidly, shortly after sunrise [Guichard et al., 2004; Grabowski et al., 2006] rather than generating steadily deepening convection cells. The parameterized convection scheme also triggers convection independently in grid cells. As a result, cold cloud tops are generated very early in the day with an initial cluster size distribution reflecting the similarity or otherwise of the conditions in adjacent grid cells.
 The 4 km model bears much closer comparison to the observations with a progression in the excess number of systems from medium to large length scales at a similar rate to the observations. An intriguing additional feature, however, is that the peak for smaller systems occurs early in the morning. From animations of the OLR this appears to correspond to the way that the large systems in the model tend to “shatter” overnight into many small systems rather than the more organic “dissolving” behavior exhibited by the GERB data. Two series of snapshots illustrating this for the 4 km model and GERB data are shown in Figures 5 and 6, respectively.
 The improvement in the representation of convection with resolution is consistent with the results of Guichard et al.  who compared the behavior of seven single column models (SCM) with parameterized convection to that of three cloud resolving models (CRM) with horizontal resolutions of 2 km or better. The SCMs generally showed precipitation beginning early in the day and continuing until evening. In contrast, the CRMs produced precipitation strongly peaked in the late afternoon and early evening. While our 4 km model has too coarse a resolution to be truly cloud resolving and still includes convective parameterization, it generates less than 1% of the total rainfall through this scheme. In contrast, 94% of the rainfall is generated by the convection scheme in the 12 km model.
 As a test of our understanding, we can apply a Fourier analysis as described earlier to the data. In this case, we calculate the Fast Fourier Transform of Z(L,t) in the time direction and examine the phase (ϕ(L)) of the coefficient corresponding to the diurnal signal. This phase is plotted against length scale in Figure 7 for the three data sets. The diagram confirms the principal interpretation of Figure 4 given above, with the 12 km model response flat across all length scales and earlier in the day than the observations, and the 4 km model results diverging from the observations at short length scales. The apparent precision of this type of plot makes it appealing to use as the primary analysis tool rather than the underlying image of Z(L,t). However, in situations with only slightly greater complexity, ϕ(L) can begin to vary significantly and loses its utility. In contrast, plots of Z(L,t) retain the subtlety of the evolutionary behavior and remain useful.
 An equivalent diagram to Figure 4 is plotted in Figure 8 but now using Fth = 210 W m−2 (Tb,e = 247 K). At this level, the clusters are additionally sampling significantly lower, warmer clouds. Both models show the same behavior as at the lower threshold flux level. In contrast, the GERB panel is now remarkably different, exhibiting a clear semidiurnal signal. The previous late afternoon evolution appears to have been retained, if less distinctly, at all length scales. However, there is now an additional peak number of systems occurring at the same early morning time as in the 4 km model for small- and medium-scale systems. This covers a larger range of length scales than the 4 km model but there is no slope apparent that might suggest a progression from larger to smaller systems mirroring the earlier growth for the cold systems. Instead, the large cold systems that have been generated during the day appear to break up as warmer systems with a spectrum of sizes in the early morning. The consistency of the behavior of the 4 km model at both flux thresholds suggests that there we are seeing growth of cold systems during the day toward large scales that appear to suddenly break up into small fragments of equally cold clouds.
 Several authors have reported semidiurnal cycles in total cloudiness and precipitation in the tropics [e.g., Gruber, 1976; Augustine, 1984; Liu and Zipser, 2008]. Almost invariably, these occur over the ocean. Minnis and Harrison  do report two maxima in the diurnal behavior of cloudiness over parts of South America but they attribute this to the peculiar local interaction of the coastline and mountainous terrain. In direct contrast, versions of Figure 8 generated separately for each quarter of the GERB domain show a clear double peaked behavior arising from the NE quadrant (that is almost exclusively land) but no apparent signal from the SW quadrant (containing a substantial fraction of ocean). This result is consistent, however, with Allan et al.  who examined the cloud masks derived from SEVIRI data for equatorial Africa (7°E–45°E,10°S–10°N) during July 2006. They showed that, while the convective cloud fraction exhibited a single-peaked diurnal signal, the total cloud fraction that also included warmer clouds had two peaks at 3 and 12 UTC.
 The underlying distributions of the two statistics (, σ) used to generate Figures 4 and 8 themselves yield useful information about the relative behavior of modeled and observed systems. The mean size distributions plotted in Figure 9 appear to follow a power law of the form = N0L−Γ over a large range of length scales found by several other studies [e.g., Machado et al., 1992]. Zhao and Di Girolamo  highlight the difficulties involved with comparing parameters derived from fitting to such a form from different data sets due to variations such as scene selection and cloud detection criteria, regional peculiarities, the overall meteorological state, and the details of the fitting procedures themselves. Nonetheless, having produced three data sets in a consistent way from the same physical quantity, we can be confident, at least, of being able use this as a fair test of our models against the GERB data.
 The derivation of values for the parameters N0 and Γ requires a weighted fit to the data which in turn requires an estimate for the standard error () on each data point. The plotted error bars represent the measured standard deviation σ of each bin. However, the significant variability evinced by the size of these error bars is due largely to the systematic diurnal cycle signal superimposed on to the random variability. To make progress we must appeal to our assumption that the diurnal signal averages to zero and that the mean size distribution is thus subject solely to an underlying intrinsic Poisson noise with σ = . We would then normally proceed by calculating the standard error on the mean in each bin from the population variance 2 = σ2/nf = /nf where nf is the total number of image frames in the data set. However, given that systems can persist for many hours, it is not valid to assume that each image is an independent trial. Using a mean lifetime for a convective storm system (tc) and interval between images (Δt), we have instead effectively nfΔt/tc independent trials. The standard error on the mean number of systems in each length scale bin we adopt is thus = . In order to place conservative error estimates on the fitted parameters, we use a long tc = 6 h. This results in 40 “trials” over the course of the 10 days of model data. It is therefore reasonable to appeal to the central limit theorem and assume that the distribution of mean values has a Gaussian distribution. Values derived from a nonlinear least squares fit to the data sets are listed in Table 1. The fitted region was restricted to 30 km < L < 200 km to avoid apparent breaks in the power law. The reduced χ2 values are all agreeably <1, consistent with our conservative error assumptions.
Table 1. The Derived Parameters for the Best Fit Power Laws of the Form = N0L−Γ to the Mean Size Distribution of Cloud Clusters for Two Values of Ftha
150 W m−2
210 W m−2
The fitted region was restricted to 30 km < L < 200 km to avoid apparent breaks in the power law. The value of N0 for the 4 km model was corrected to account for the smaller area it covered.
189 ± 30
0.931 ± 0.038
2590.9 ± 1.1
1.40482 ± 0.00011
4 km model (rebinned)
2240 ± 310
1.304 ± 0.034
2880 ± 390
1.352 ± 0.033
4 km model (full resolution)
2960 ± 400
1.360 ± 0.033
3450 ± 470
1.392 ± 0.033
12 km model
3140 ± 690
1.633 ± 0.055
2290 ± 490
1.517 ± 0.053
 At the lower threshold, the fitted power law index (Γ) is much smaller for the observed data than for the model. This is apparent in Figure 9 with the slope of the GERB data clearly differing from the models. The two model distributions appear to have similar slopes. However, the derived error estimates for Γ from the models indicate that they are formally inconsistent with each other. Although the 4 km model does represent an improvement in Γ, it consistently overpredicts the amount of cloud at all length scales. The model values change little between the two thresholds. However, the observed value of Γ is much larger at the higher flux threshold, now falling between the two model values and in agreement with them both at around the 2σ level.
 The above remarks concerning the difficulties of intercomparison notwithstanding, Zhao and Di Girolamo  present values for Γ ranging from 1.19 to 2.18 for cumulus cloud in the size range of interest [Cahalan and Joseph, 1989; Sengupta et al., 1990; Benner and Curry, 1998]. These arise from high-resolution observational studies (∼50 m) with similar numbers of pixels to ours but consequently much smaller domains. Machado et al.  use Meteosat data with a similar resolution and covering a similar region to ours, finding Γ = 1.3 for their warm detection threshold (TIR = 253 K) and Γ = 0.8 for TIR = 218 K. In all these cases, we have accounted for the authors' use of the number density and our logarithmic binning by subtracting 1 from the quoted values.
 As we move to longer length scales, the diurnal signal in Figure 4 becomes less distinct. Figure 10 shows the measured σ2/ plotted against L for the three data sets using Fth = 150 W m−2. Since the main source of variability is the diurnal signal, this plot essentially measures the strength of that signal against length scale. All three data sets asymptote to σ2 ≈ N that one would expect for pure Poisson noise, supporting our earlier assumption regarding the underlying distribution. However, both models inject a greater signal than apparent in the observations at length scales ≲100 km, although the 4 km model does represent an improvement. The 12 km model generates a large number of single pixel storms with a strong diurnal variability that runs off of the scale in Figure 10.
 A technique for assessing the diurnal development of convective storm systems has been presented based on comparisons of OLR simulated by high-resolution versions of the Met Office UM and observed by GERB. Applying an OLR thresholding technique to define the distribution of convective systems, a direct representation of the aggregated evolutionary behavior of convective storms can be generated by considering anomalies in the size distribution through the standard score statistic. The values in the length scale-time domain reveal the time of peak activity for storms of different sizes from which the growth and dissipation behavior can be inferred. Greater precision and insight can be gained by using the underlying statistics of the distributions to define how data sets differ in the way they exhibit convection. The 12 km model test case produced convection significantly earlier in the day than observed and showed no evidence for size growth. In contrast, the 4 km model produced realistic evolution at medium to large scales but the small-scale behavior was dominated by the large number of systems suddenly produced when the storms broke up overnight rather than the gentler dissipation in the observed data. Both models generated a greater variability in the number of storms of all sizes.
 Now that we can diagnose the realism or otherwise of a model's evolutionary behavior, we can begin to probe the mechanisms that cause this. Is the crucial process mesoscale circulation such as downdrafts and subsequent cold pools propagating as density currents and organizing convection? How do the clouds “communicate” with each other? Is it through direct tendencies or the large-scale environment? What is the balance between the growth of individual systems and mergers? We also intend to investigate the relationship between the African Easterly Wave present in the domain and the growth behavior, the representation of initiation in different regions in the domain and the structure of the mature systems.
 This work was undertaken as part of the Cascade project funded by the Natural Environment Research Council under grant NE/E00525X/1. We thank three anonymous reviewers for their helpful remarks on the initial version of this paper.