## 1. Introduction

The modelling of clouds in general circulation models (GCMs) needs a proper theoretical framework for the representation of the subgrid-scale humidity and temperature variations and a balance between the complexity of the parametrization and the computational efficiency. One approach to handle these demands is to apply statistical cloud schemes. They are able to take into account the spatial variability of a modelled quantity in the grid box in terms of a probability density function (PDF). Pincus and Klein (2000) found that by accounting for the subgrid-scale variability in microphysical processes in cloud schemes, the bias in the nonlinear process rates caused by using averaged quantities can be reduced and arbitrary tuning of parameters can also be avoided. Early statistical schemes were developed by Sommeria and Deardorff (1977) and Mellor (1977). They used a joint PDF for the subgrid-scale liquid temperature and total water content to estimate the cloud fraction by integrating over the saturated part of the PDF. This concept implies that the saturated part of the PDF condenses immediately and that the chosen PDF shape is able to reproduce the spatial distribution of the total water content.

The simplest form of a PDF is based on a symmetric uniform distribution and is used by LeTreut and Li (1991) to describe the horizontal variability of the sum of water vapour and cloud condensate in a grid box. A similar approach was made by Smith (1990), who uses a relative humidity threshold function derived from a symmetric triangular PDF diagnosing the variance of cloud water content. This simple condensation scheme was adopted by Rotstayn (1997) and Nishizawa (2000). Likewise a symmetric, but Gaussian, form of PDF was applied by Bechtold *et al.* (1995) to model the cloud water content, partial cloudiness and liquid-water flux in stratocumulus cases. It is evident that these simple PDFs are not able to reproduce the spatial distribution of the respective quantity in all cases, but they are easy to handle and have a low computational cost. Using a PDF that is also skewed, a quantity can be described by the same PDF in different cases. A beta function having this capability is employed by Tompkins (2002) to represent the spatial distribution of total water mixing ratio, and Bony and Emanuel (2001) used a log-normal distribution to model the total water content. Bougeault (1981) showed that a skewed exponential distribution can be applied to parametrize clouds in a one-dimensional model.

More comprehensive concepts were also developed by Watanabe *et al.* (2009), who use double-uniform and skewed-triangular distributions in different conditions for temperature and total water content fluctuations. Golaz *et al.* (2002) apply a joint PDF of a double-Gaussian function of vertical velocity, liquid water potential temperature and total specific water content to characterize the unresolved subgrid variations in a grid box. A better representation of low-level boundary-layer clouds and an improved atmospheric boundary-layer structure were achieved by Kuwano-Yoshida *et al.* (2010) using joint-Gaussian PDFs of the liquid water potential temperature and total water content in the atmospheric general circulation model for the Earth Simulator (AFES).

This short overview of various statistical PDF schemes shows that many different forms of PDFs are used to describe a certain quantity in cloud microphysical processes. The reasons are on the one hand the complexity of the scheme, e.g. how many parameters are needed to determine the shape of the PDF, and on the other hand the particular humidity-related quantity to be simulated in the model. Although the choice of assumed shape of the humidity distribution is one crucial criterion for successful performance of a statistical PDF scheme, there is an absence of information about the respective PDF from measurements. Several observational studies were carried out to analyze the shape of the distributions for humidity-related quantities in different conditions. Wood and Field (2000) found complex and often bimodal PDFs of total water content with large values of skewness in data from flights through stratocumulus clouds, which capped a well-mixed planetary boundary layer (PBL) or were decoupled from it. Similar results for PDFs of liquid water content that were bimodal as well as positively and negatively skewed were reported by Davis *et al.* (1996), measured during flights through marine stratocumulus clouds. Moreover, in the PBL various shapes of humidity distributions such as Gaussian, skewed, platykurtic and multimodal were classified by Price (2001), examining data from tethered-balloon measurements.

Another approach to obtain information regarding the distribution of humidity-related quantities is to utilize cloud-resolving models (CRMs) simulating meteorological cases in a determined region. Such an approach was taken, additionally to the use of observational data, by Bony and Emanuel (2001), who found close to Gaussian forms of PDFs of total water at low levels and skewed ones at high levels in CRM simulations of the GATE Phase III experiment. Different results were achieved by Tompkins (2002), who found more beta distribution forms of PDF of total water mixing ratio in data produced from the large-eddy simulation model of the Met Office. The studies show that the shape of the measured PDF varies a lot depending on the analyzed data (flight measurements or CRMs). Flight datasets have the advantage of being results from measurements of the atmosphere, but they depend on the environmental conditions and are only available on a small path along the flight route, i.e. it is not possible to measure the quantity over a large area at the same time. Simulated data from CRMs possess this feature, however the results depend on the chosen spatial and temporal resolution as well as on the initial conditions needed to run the model. Satellite data combine the advantages of covering large regions and being real measurements of the atmosphere.

New, high spatial resolution (of order of 5 × 5 km^{2}) retrievals of column water vapour and column cloud condensate are available from satellites globally. These observational data allow construction of horizontal PDFs at the much coarser resolution of typical GCMs. Unfortunately, data with high spatial resolution only allow for the retrieval of vertically integrated total water (total water path, hereafter TWP), while instruments providing vertically resolved water vapour and cloud-condensate mixing ratio retrievals (such as infrared sounders) still have resolutions that are spatially too coarse (of the order of 20 × 20 km^{2} and greater) to allow the construction of PDFs.

This article presents an evaluation of the subgrid-scale variability scheme of total water mixing ratio developed by Tompkins (2002) in the ECHAM5 general circulation model of the atmosphere. The defining parameters of the PDF, the modelled mean of TWP, its variance and skewness as well as the total cloud cover are compared with high-resolution satellite data. To make the three-dimensional modelled total water mixing ratio comparable to the two-dimensional satellite column total water retrieval, the modelled water vapour, cloud liquid water and cloud ice are vertically integrated using maximum overlap by the stochastic subcolumn generator developed by Räisänen *et al.* (2004) and added to TWP. Moreover, in sensitivity experiments processes are identified explaining part of the discrepancy between the parametrized variance and skewness and the observational data. In section 2, the model experiments and the methods of analyzing the model and satellite data are explained. Afterwards, the data derived from the Moderate Resolution Imaging Spectroradiometer (MODIS) satellite instrument are discussed in section 3. In section 4, the results of the comparison between the modelled parameters and the ones derived from MODIS as well as the results of the sensitivity experiments are presented. The evaluation closes with a summary of the results and conclusions in section 5.