Journal of Geophysical Research: Atmospheres

Entropy-based method for extreme rainfall analysis in Texas

Authors


Corresponding author: Z. Hao, Department of Biological and Agricultural Engineering, Texas A&M University, College Station, Texas. (hzc07@tamu.edu)

Abstract

[1] Annual rainfall maxima are commonly used for rainfall analysis, which entails the use of a distribution for modeling extreme values. Analysis of rainfall data from different regions of Texas, USA, shows that the form of the frequency distribution of annual rainfall maxima changes with different time durations, climate zones, and distances from the Gulf of Mexico. Employing the entropy theory, an entropy-based distribution for modeling annual rainfall maxima is derived, which is expected to apply across different time durations, climate zones, and distances from the Gulf. The performance of the proposed distribution is assessed with synthetic data from known distributions, and results show that the performance of the proposed entropy-based distribution is generally comparable with the generalized extreme value (GEV) distribution and is preferable for highly skewed data. Comparison based on observed rainfall data also shows this attractive property of the proposed distribution. Thus, the entropy-based distribution provides a promising alternative for frequency analysis of extreme rainfall values. The proposed distribution is then applied to the annual rainfall maxima, and results show that the entropy-based distribution fits the empirical probability distribution well and also performs well in modeling extreme rainfall values for different time durations, climate zones, and distances from the Gulf.

1 Introduction

[2] Rainfall frequency analysis is used for constructing intensity-duration-frequency (IDF) curves, which are needed for a range of hydrologic designs, including drainage systems, culverts, roadways, parking lots, runways, and so on. Extreme rainfall values, such as annual rainfall maxima, are of interest in modeling floods and quantifying the effect of climate change. From the fitted distribution, statistical properties of extreme rainfall values can be investigated and extrapolated beyond the available data for engineering purposes.

[3] The generalized extreme value (GEV) distribution is one of the frequently employed probability distributions for modeling and characterizing extreme values. Derived from the extreme value theory, it is a three-parameter distribution encompassing three classes of distributions, namely, Gumbel, Frechet, and Weibull. This distribution has been used for extreme rainfall frequency analysis in different areas of the world. Schaefer [1990] used the GEV distribution for frequency analysis of annual rainfall maxima of durations of 2, 6, and 24 hours for the state of Washington. Huff and Angel [1992] selected the GEV distribution to model the distribution of annual rainfall maxima for durations from 5 minutes to 10 days in the mid-western United States. Parrett [1997] also used the GEV distribution to construct dimensionless frequency curves of annual rainfall maxima of durations of 2, 6, and 24 hours within each region in Montana. Using the L-moment ratio diagram, Asquith [1998] determined that the GEV distribution was an appropriate distribution for modeling the distribution of annual rainfall maxima for durations from 1 to 7 days. Alila [1999] showed that the annual rainfall maxima of durations from 5 minutes to 24 hours in Canada were better described by the GEV distribution than other distributions, such as the generalized logistic and EV1 distributions.

[4] Extreme rainfall exhibits different properties for different durations in different regions. Analysis of rainfall characteristics is important for choosing a suitable rainfall distribution and consequently estimating rainfall quantiles. Therefore, the objective of this study is to investigate the change in the form of the annual rainfall maxima frequency distribution with changes in the time duration, climate zone, and distance from the Gulf of Mexico and then derive an entropy-based distribution that is sufficiently flexible for characterizing rainfall distributions for different durations in different climatic zones or at different distances from the Gulf of Mexico. The performance of the proposed entropy-based distribution is assessed using synthetic data through Monte Carlo simulation and observed rainfall data and is shown to be a promising alternative distribution to the commonly used GEV distribution for modeling extreme rainfall values, especially observations with high skewness.

[5] This article is organized as follows. In section 2, the change in the form of empirical distributions of annual rainfall maxima is investigated. Using the entropy theory, a generalized distribution is derived in section 3 and the performance of this distribution is assessed by comparing the GEV distribution in section 4. After the application of the proposed entropy-based distribution in section 5, conclusions are given in section 6.

2 Empirical Frequency Distributions

2.1 Study Area

[6] The area selected for this study is the state of Texas (longitude: 93°31'W to 106°38'W, latitude: 25°50'N to 36°30'N). The climate of Texas is strongly influenced by physical features, including the Gulf of Mexico. The passage of frontal systems from northwest and the moist air moving inland from the Gulf of Mexico are the two competing influences that dominate the climate of Texas, while proximity to the coast is the most important factor that determines the regional climatic differences in Texas [North et al., 1995].

[7] There are three major types of climate in Texas, which are classified as continental, mountain, and modified marine, with no clearly distinguishable boundaries, while the modified marine zone is further classified into four “subtropical” zones [Larkin and Bomar, 1983]. The Mountain climate is dominant in several mountains of the Trans-Pecos region and is not included in this study. The different climate zones of the Continental and Modified Marine climate are abbreviated as continental steppe (CS), subtropical arid (SA), subtropical humid (SH), subtropical subhumid (SSH), and subtropical steppe (SST), the boundaries of which are approximated with circles in Figure 1. In addition, the U.S. National Weather Service (NWS) has divided Texas into 10 climate divisions (including Upper Coast, East Texas, High Plain, Trans-Pecos, and so on) (available at the website: http://www.nass.usda.gov/Statistics_by_State/Texas/Charts_&_Maps/cwmap.htm) and are also used accordingly in this study.

Figure 1.

Regions of climate zones in Texas and rainfall stations used in this study.

2.2 Data Description

[8] Rainfall data for 15-minute, hourly, and daily duration for 99 NWS stations, which are also shown in Figure 1, were obtained from the National Climatic Data Center (http://www.ncdc.noaa.gov). Not all stations have rainfall data of all three durations. To obtain a relatively long record of rainfall data for different durations, the 15-minute, hourly, and daily rainfall data were used as original data sources and then rainfall data of other durations were produced based on these original data. At different stations, there exist missing values for some periods for each time duration. Only data with no less than 9-month observations for each year were selected for this study. The 15-minute data are available for a few stations and they are of a relatively short period. The hourly and daily rainfall data are available for relatively more stations and for longer periods. The 45-minute annual maxima were compiled from the 15-minute data. Likewise, the rainfall data for 12-hour duration and 7- and 30-day durations were compiled from hourly and daily data, respectively. Annual rainfall maxima data were then obtained from these rainfall data for different durations, climate zones, and distances from the Gulf.

2.3 Change With the Time Duration

[9] Histograms of annual rainfall maxima of different durations were prepared for all stations used in this study, and the number of bins was approximately equal to the square root of the number of observations [Montgomery and Runger, 2010]. The histograms of a sample station (411956) are shown in Figure 2 and the length of observations (n) is also shown in the figure. It was observed that frequency distributions for short durations were more skewed, while those for long durations were less skewed. For example, annual rainfall maxima data for station 411956 had a skewness value of 2.7 for 15-minute data, but 1.1 for 30-day data. To further show this characteristic, the box plot of skewness values for 40 data sets from a relatively long record (≥22 years) of different durations is given in Figure 3. For example, the 75th percentile of skewness of the 15-minute duration was around 3.2, while that for the 30-day duration was 1.1. Though a general trend of the skewness of different time durations cannot be obtained, based on the selected data sets, a slight tendency of higher skewness for the short duration is revealed. This is partly because for short durations such as 15-minute, a large amount of rainfall may occur within a short time in certain cases exhibiting large skewness, while for long durations, such as 30 days, the rainfall is averaged and thus exhibits less skewness.

Figure 2.

Histograms and probability density functions of annual rainfall maxima of different durations for station 411956 in the SH zone.

Figure 3.

Box plot of skewness values for annual rainfall maxima of different durations (40 data sets for each duration).

2.4 Change with the Climate Zone

[10] In this section, frequency distributions were analyzed for different climate zones. No clear pattern of frequency distributions in the SSH and SST zones was found from the data from several stations selected in this study. Therefore, only the frequency distributions for the rest of the climate zones were analyzed. Two stations for each climate zone were selected to illustrate the typical frequency distribution for 12-hour annual rainfall maxima, as shown in Figure 4. The length (n) of record and the variance of data (σ2) are also shown in the figure.

Figure 4.

Histograms and probability density functions of 12-hour annual rainfall maxima from different climate zones.

2.4.1 SH

[11] The SH zone lies in the eastern part of Texas, which is mostly noted for warm summers [Larkin and Bomar, 1983]. Ten stations were selected for the study. This zone includes most parts of the Upper Coast and East Texas division. There are four rainfall-generating mechanisms that exist in the Upper Coast area, leading to varying patterns from year to year as one or more of these controls change: In May, the typical thunderstorm pattern is expected slightly inland, while the belt of maximum activity is along the coast by July; in September, tropical disturbances can cause very heavy rains for some years, while in December frontal activity affects the region [National Fibers Information Center, 1987]. The East Texas division is characterized by a fairly uniform seasonal rainfall, with slight maxima occurring in May and December, and there is little variation in the weather in the summer season because the influence of the Gulf of Mexico is dominant [National Fibers Information Center, 1987]. The most widespread and lengthy precipitation periods in East Texas during spring and autumn occur when the cold air forms a barrier, forcing the overriding moist Gulf air to be deflected upward where it cools and condenses [Carr, 1967].

[12] Two stations (411956 and 410569) were used for illustration of the typical frequency distribution and the histograms are shown in Figures 4a and 4b. It can be seen that frequency distributions are relatively smooth for this duration, with higher variance than that for climate zones CS and SA. This region is along the coast, and the rainfall pattern is affected by the Gulf of Mexico. Since the proximity to the coast is the most determining factor for regional climate differences [North et al., 1995], the reason for this frequency distribution pattern may be due to the moderating influence of moisture from the Gulf of Mexico.

2.4.2 CS

[13] The CS zone lies in the northwestern part of Texas and includes the regions similar to the High Plain division. The rainfall amount increases steadily through spring and reaches a maximum in May or June, while the thunderstorm activity is also on the rise during the spring season [National Fibers Information Center, 1987]. In this region, summer is the wet season and thunderstorms are numerous in June and July, but begin to decrease in August [National Fibers Information Center, 1987]. Two stations (414098 and 415411) were used for illustration of the typical frequency distribution and the histograms for 12-hour annual rainfall maxima are shown in Figures 4c and 4d. The variances for the two stations are not as high as those for the SH climate zone. The frequency distributions in this part are relatively sharp, compared with those from the SH climate zone. The reason may be that the maximum rainfall mainly comes from the thunderstorms during the summer season.

2.4.3 SA

[14] The SA zone lies in the extreme western part of Texas and includes the region similar to the Trans-Pecos division. The basin and plateau region of the Trans-Pecos features a subtropical arid climate, which is marked by summertime rainfall anomalies of the mountain relief [Larkin and Bomar, 1983]. Rainfall reaches its maximum in July and in summer, where the rain comes mainly from thunderstorms, often affected by local topography [National Fibers Information Center, 1987]. In the Trans-Pecos region, the biggest percentage of rainfall occurring in this area is due to convective showers and thundershower activity, while the thundershower activity is the primary contributor of rainfall during late-summer and early-autumn months [Carr, 1967]. Two stations (416893 and 412797) were selected for illustration of the typical frequency distribution and the histograms of the 12-hour annual rainfall maxima are shown in Figures 4e and 4f. The variances for the two stations are relatively small and the frequency distributions are relatively sharp, compared with those from the SH climate zone. The reason for the variation of rainfall may be that the heavy rainfall in SA is mainly produced due to the convective shower and thundershower activity.

[15] In general, frequency distributions for regions in extremely northern and western parts (or the CS and SA climate zones) were sharp; however, those for the regions in the southeast near the Gulf of Mexico (or the SH climate zone) were rather smooth. Although only a few of the possible mechanisms of rainfall in each region were investigated, the analysis provided an insight into the reason for the specific rainfall frequency distribution pattern in each climate zone.

2.5 Influence of the Distance From the Gulf

[16] The Gulf of Mexico is particularly important for the climate of Texas, as it provides the source of moisture and modulates the average seasonal and diurnal cycles, particularly in the coastal regions [North et al., 1995]. In general, the average annual rainfall decreases with increasing distance from the Gulf of Mexico. To assess the effect of the Gulf of Mexico on the distribution of annual rainfall maxima, 20 stations were selected and divided into two groups, each with 10 stations according to the distance from the Gulf of Mexico. The histograms of 12-hour maximum rainfall for four sample stations are shown in Figure 5. It can be seen that the frequency distributions in group II (more than 250 miles away from the Gulf) are not as smooth as those in group I (within 60 miles from the Gulf), and the variances for the stations in group II are not as high as those in group I. The smoothness of frequency distributions in group I is partly due to the closeness of rainfall stations to the Gulf of Mexico. The effect of the Gulf of Mexico is reduced with distance, and the topography may also play an important role in the rainfall-generating mechanism. The frequency distribution pattern for the two stations in group II may be due to the mixed effect of the Gulf of Mexico and topography.

Figure 5.

Histograms and probability density functions of 12-hour annual rainfall maxima of different distances from the Gulf of Mexico (414309, 60 miles; 412015, 20 miles; 411698, 480 miles; 412621, 450 miles).

[17] It is clear that the frequency distribution varies with the duration, climate zone, and distance from the Gulf. The question arises if a probability distribution can accommodate the effect of these factors. This is addressed in what follows.

3 Annual Rainfall Maxima Distribution Using Entropy Theory

3.1 Derivation of Distribution

[18] Let the annual rainfall maxima for a given duration be represented as a continuous random variable, X є [a, b], with a probability density function, (pdf), f(x). For f(x), the Shannon entropy, E, can be defined as shown in equation (1) [Shannon, 1948; Shannon and Weaver, 1949]:

display math(1)

where x is a value of random variable X with lower limit a and upper limit b. Jaynes [1957] developed the principle of maximum entropy, which states that the probability density function should be selected among all the distributions with the maximum entropy subject to the given constraints. The constraints can be expressed in general form as shown in equation (2):

display math(2)

where function gr(x) is the known function with g0(x) = 1, E(gr) is the rth expected value obtained from observations with E(g0) = 1 (e.g., if g(x) = x, then E(gx) is the mean of x), and m is the number of constraints.

[19] The maximum entropy-based probability density function can then be obtained by maximizing the entropy in equation (1), subject to equation (2), using the method of Lagrange multipliers, as shown in equation (3) [Kesavan and Kapur, 1992]:

display math(3)

where λr (r = 0,1,…, m) are the Lagrange multipliers.

3.2 Maximum Entropy Distribution With Moments as Constraints

[20] With the first four moments as constraints, the maximum entropy-based probability density function (denoted as ENT4) defined on the interval, [a, b], with the function g(x) in equation (2) expressed as gi(x) = xi (I = 1, 2, 3, and 4), can be expressed as shown by equation (4):

display math(4)

[21] In this study, the lower limit of the interval, a, was set to be zero, while the upper limit, b, was set to be 20 times the observed maximum value. Since higher moments are involved in this distribution, a relatively large data set would be needed for the accuracy of moment estimation.

[22] With the first four moments as constraints, the skewness, kurtosis, and multiple modes can be included in the resulting maximum entropy-based distribution [Zellner and Highfield, 1988]. Each maximum of the polynomial inside the exponential corresponds to one mode, and thus the multiple modes may exist in the maximum distribution [Smith, 1993]. This distribution has been applied for fitting bimodal distributions [Eisenberger, 1964]. Matz [1978] examined this distribution in detail and application of this distribution showed that it fitted the observed frequencies well. Comparing this distribution with the Pearson distribution type III, Zellner and Highfield [1988] showed that it provided a better fit, especially at the tails. Smith [1993] used the maximum entropy-based distribution with moments as constraints for decision analysis to construct the distribution of value lottery and showed that the distribution with the first four moments as constraints performed well.

[23] In this study, the entropy-based distribution in equation (4) is proposed as an alternative for modeling extreme rainfall values. In addition, the entropy-based distribution with the first three moments as constraints was also selected as a candidate for modeling extreme rainfall values. From equation (3), this distribution can be expressed as shown by equation (5):

display math(5)

[24] There are three parameters associated with this entropy-based distribution and is denoted as ENT3 in this study.

3.3 Estimation of Parameters

[25] The Lagrange multipliers of equation (4) can be determined using equation (2), where inline image (r = 1,..,4) are the expectation of the first four noncentral moments. Generally, the analytical solution does not exist and the numerical estimation of the Lagrange multipliers is needed. To that end, one can maximize the function shown in equation (6) [Mead and Papanicolaou, 1984; Wu, 2003]:

display math(6)

[26] The maximization can be achieved by employing Newton's method. Starting from some initial value, λ(0), one can solve for Lagrange parameters by updating λ(1) through equation (7) given below:

display math(7)

where the gradient, Г, is expressed as shown by equation (8):

display math(8)

and H is the Hessian matrix whose elements are expressed as shown by equation (9):

display math(9)

4 Model Evaluation

4.1 Performance Measure

[27] To quantify the performance of the proposed distribution in modeling the extreme rainfall quantiles, the root mean square error (RMSE) was used, which can be defined as shown in equation (10):

display math(10)

where n is the length of the observed data, xi are the quantiles estimated from the proposed distribution, and oi are the observed quantiles corresponding to the empirical nonexceedance probabilities estimated by the plotting position formula. In this study, the Gringorten plotting position formula was used as shown in equation (11) [Gringorten, 1963]:

display math(11)

where i is the rank of the observed values and n is the length of the observed data.

4.2 Synthetic Data From Known Distribution

[28] Monte Carlo experiments were first carried out to compare the quantiles estimated from the GEV, ENT4, and ENT3 distributions. Two Monte Carlo simulations were conducted with random numbers generated from the known GEV and lognormal distributions. Random numbers of three different lengths (namely, 40, 70, and 100) were generated, which were used to approximate the record length of the 15-minute, hourly, and daily rainfall data in this study. For the first simulation (S1), the quantiles corresponding to different return periods (T = 5, 10, 25, 50, 100, and 200 years) were first assessed with the random numbers generated from the GEV distribution. For the second simulation (S2), the quantiles corresponding to a relatively long return period (T = 100 and 200 years) from the three distributions were assessed with the synthetic data generated from lognormal distribution with different skewness values.

4.2.1 Random Number From GEV Distribution

[29] The GEV distribution has been applied extensively in hydrology for extreme rainfall analysis. Its probability density function is defined as shown by equation (12):

display math(12)

where k, σ, and u are the shape, scale, and location parameters. In this study, the MATLAB (The Mathworks, Inc., Natick, Mass.) function, gevfit, was used for the parameter estimation of the GEV distribution with maximum likelihood method.

[30] One thousand data sets of random numbers with different sample sizes (n = 40, 70, and 100) were generated from this parent distribution. The GEV, ENT4, and ENT3 distributions were then fitted to these data sets and the quantiles corresponding to different return periods were obtained. Parameters (k, σ, and u) of the parent probability density function were assigned (0.3, 0.3, and 1.2), and the probability density function is shown in Figure 6a. The median and the RMSE values of the estimated quantiles for simulation S1 are shown in Tables 1 and 2. From the median values, it can be seen that for short return periods (T ≤ 50 years), the median values from the ENT4 and GEV distributions were close to each other for each sample size. For example, for sample size n = 100, the median values from GEV and ENT4 for a return period of 50 years were 3.42 and 3.40, respectively, while the observed value was 3.42. Generally, the RMSE values of the ENT4 distribution were slightly larger than those of the GEV distribution; however, these results were acceptable. For the quantiles corresponding to the relatively long return periods (100 and 200 years), the median quantile from the ENT4 distribution is moderately underestimated, while that from the GEV distribution was close to the observed value. This is not unexpected, since the random numbers were generated from the GEV distribution and then the GEV distribution was fitted. Generally, ENT4 modeled the data generated from the GEV distribution well, especially when the sample size was relatively large. The ENT3 distribution also estimated the quantiles relatively well for short periods (T ≤ 25 years), while it did not model the quantiles well corresponding to relatively long return periods (T ≥ 50 years).

Figure 6.

Parent distributions for Monte Carlo simulation. (a) GEV distribution and (b) lognormal distribution with different skewness(s).

Table 1. Median of Estimated Quantiles With Random Number Generated From the GEV Distribution
Return Period (Years)Observed QuantileEstimated Quantile
n = 40n = 70n = 100
GEVENTENT3GEVENTENT3GEVENTENT3
51.771.751.741.971.761.741.991.761.752.02
102.162.142.092.222.162.082.252.152.072.30
252.812.762.802.492.782.772.542.802.712.60
503.423.353.362.673.373.422.723.423.402.79
1004.184.113.572.834.093.862.894.164.212.96
2005.105.043.692.985.034.053.045.064.423.12
Table 2. RMSE of Estimated Quantiles With Random Number Generated From the GEV Distribution
Return Period (Years)RMSE
n = 40n = 70n = 100
GEVENTENT3GEVENTENT3GEVENTENT3
50.140.150.380.100.120.350.090.110.36
100.250.280.470.190.220.410.160.180.44
250.550.770.660.400.520.590.340.420.62
500.961.700.950.691.020.910.570.970.90
1001.641.741.431.152.281.450.932.671.37
2002.732.002.121.862.382.361.462.742.05

4.2.2 Random Number From Lognormal Distribution

[31] The probability density function of the log-normal distribution can be expressed as shown in equation (13):

display math(13)

where u is the mean in the log scale and σ2 is the variance in the real scale. The skewness coefficient, s, is related with the variance, σ2, as inline image.

[32] One thousand data sets of random numbers with different sample sizes (n = 40, 70, and 100) with different skewness values of 1, 2, 2.5, and 3 were generated from the lognormal distribution and were used for comparison. Parameter u was assigned a value of 0.3, while the standard deviations corresponding to different skewness values were assigned values of 0.31, 0.55, 0.64, and 0.72, respectively. The pdfs of the parent distribution with these parameters are shown in Figure 6b. The objective of this simulation was to show the performance of these distributions in modeling data with different values of skewness. The median and RMSE values of the estimated quantiles, x100 and x200, for return periods of 100 and 200 years corresponding to nonexceedance probabilities of 0.99 and 0.995 are shown in Tables 3 and 4.

Table 3. Median of Estimated Quantiles (x100 and x200) With Random Number Generated From the Lognormal Distribution With Different Skewness (k)
SknewnessQuantileObservationEstimated Quantile
n = 40n = 70n = 100
GEVENTENT3GEVENTENT3GEVENTENT3
k = 1x1002.802.732.612.442.772.692.462.802.742.48
x2003.032.952.722.552.992.832.583.012.882.59
k = 2x1004.875.124.343.855.074.603.885.134.773.90
x2005.596.024.544.115.984.884.146.085.054.17
k = 2.5x1005.996.485.154.626.575.584.756.605.924.81
x2007.037.945.394.978.095.915.128.146.305.21
k = 3x1007.137.845.935.358.256.835.788.347.195.85
x2008.5310.006.215.8110.577.266.3010.667.706.42
Table 4. RMSE of Estimated Quantiles (x100 and x200) With Random Number Generated From the Lognormal Distribution With Different Skewness (k)
SknewnessQuantileRMSE
n = 40n = 70n = 100
GEVENTENT3GEVENTENT3GEVENTENT3
k = 1x1000.450.430.410.320.360.360.250.300.35
x2000.610.510.520.430.450.470.330.390.46
k = 2x1001.801.411.231.181.261.090.951.191.03
x2002.821.641.641.791.441.511.431.381.45
k = 2.5x1002.651.921.711.921.951.521.631.691.36
x2004.372.302.343.062.222.122.601.961.95
k = 3x1003.952.622.433.052.751.932.412.701.69
x2006.833.143.285.133.002.774.023.032.57

[33] For the case with skewness coefficient k = 1, the median quantile from the ENT4 distribution was not as close to the observed values as from the GEV distribution. However, the difference between the median quantile estimated from GEV and that from ENT4 was relatively small, especially for relatively large sample sizes. For example, for n = 100, the median values from GEV and ENT4 were 2.80 and 2.74, with the observed value being 2.80. Generally, the RMSE values of the two distributions were close to each other. For example, the RMSE values of GEV and ENT4 for x200 were 0.43 and 0.45, respectively, for n = 70. The performance of ENT4 improved with the increase in sample size. Generally, the performances of ENT4 and GEV were comparable in this case.

[34] For skewness values of k = 2 and 2.5, the median values from GEV distribution were moderately overestimated, while those from ENT4 were moderately underestimated. When the sample size was relatively small (n = 40), the GEV distribution performed slightly better than did the ENT4 distribution for the median values. However, the RMSE value from the GEV was higher than that from the ENT4 distribution. When the sample size was relatively large (n = 100), the ENT4 distribution performed relatively better than did the GEV distribution for the median value, while their performance was comparable for the RMSE values. For example, for the case with k = 2.5 and sample size n = 100, the median values from GEV and ENT4 corresponding to the 100-year return period were 6.60 and 5.92, while the observed value was 5.99. The corresponding RMSE values for GEV and ENT4 were, respectively, 1.63 and 1.69, which are comparable. The performance of the ENT4 distribution improved with the increase in sample size.

[35] For the skewness k = 3, the median value estimated from GEV distribution was overestimated significantly, while ENT4 still performed relatively well for estimating quantiles, especially when the sample size was relatively large. For example, the true quantile corresponding to the 100-year return period was 7.13, while the quantiles from GEV and ENT4 with sample size n = 70 were 8.25 and 6.83, respectively. The corresponding RMSE values were 3.05 and 2.75, indicating that ENT4 performed relatively better.

[36] Though the RMSE values from the ENT3 distribution were comparable with those from the ENT4 distribution and sometimes even smaller than those from the ENT4 distribution, generally the median value from ENT3 was underestimated significantly for each sample size and for each case with different skewness values. These results showed that generally ENT3 did not perform as well as the GEV and ENT4 distributions and did not model extreme values satisfactorily.

4.2.3 Summary

[37] The Monte Carlo simulation, S1, showed that generally the ENT4 distribution was comparable to the GEV distribution in modeling extreme rainfall values. Since the GEV distribution has been extensively applied for modeling extreme values, the results from the first simulation, S1, showed that the ENT4 distribution would also be a candidate for modeling extreme values. The Monte Carlo simulation, S2, showed that the performance of ENT4 distribution was comparable with GEV for low skewness, especially when the sample sizes were relatively large (n ≥ 70). When the skewness was relatively high (≥2), the ENT4 distribution performed comparable with or relatively better than the GEV distribution for estimating quantiles corresponding to relatively long return periods, especially when the sample size was large. Botero and Francés [2010] also found that the GEV distribution led to large errors for quantile estimation corresponding to long return periods for high skewness. Synthetic data from other distributions (e.g., gamma distribution) were also used for comparison, and generally similar results were obtained (not presented). Thus, it can be concluded from the Monte Carlo simulation that generally ENT4 provided an alternative to the commonly used GEV distribution and should be preferable for observations with high skewness. The ENT3 distribution was not suitable for modeling extreme values.

[38] The GEV distribution can be applied to account for nonstationarity when the parameters vary with a set of covariates [Katz et al., 2002; Towler et al., 2010]. From the structure of the ENT4 distribution, the four Lagrange multipliers, λ1, λ2, λ3, and λ4, correspond to the first four moments of the data, but they do not have similar connotations of location, scale, and shape parameters. Thus, the incorporation of covariates (e.g., time) to account for nonstationarity in the analysis would be difficult, which would be a disadvantage of the ENT4 distribution, compared with the GEV distribution.

4.3 Observed Rainfall Data

[39] To further compare the performance of the GEV distribution and ENT4 distribution, the observed rainfall data for different durations (15-minute, 45-minute, 1-hour, 12-hour, 1-day, 7-day, and 30-day) from 40 stations were used. Using the RMSE measure, the three distributions were compared based on the observed and estimated quantiles corresponding to the same empirical cumulative probabilities as obtained from equation (11). Note that the results from the observed data may not be as accurate as those from the Monte Carlo simulation due to data limitation and error in approximating the cumulative probability, but still would be meaningful for rough comparison. The number of stations for each distribution performing the best (with the least RMSE) is shown in Table 5. For all durations, the ENT4 distribution performed the best for the largest number of stations. For example, for the annual rainfall maxima of the 12-hour duration of the 40 data sets, the ENT4 distribution performed the best for 36 stations according to RMSE. From these results, it can be seen that the ENT4 distribution would be a good candidate for modeling annual rainfall maxima.

Table 5. Number of Stations With the Least RMSE From Each Distribution
DurationENT4GEVENT3
15-minute3370
45-minute3631
1-hour3640
12-hour3640
1-day3370
7-day3280
30-day3280

5 Application

[40] The entropy-based distribution was used to fit the rainfall data in section 2, as shown in Figures 2, 4, and 5. These figures show that the entropy-based distribution (ENT4) fitted the empirical histograms well for the rainfall data of different durations, climate zones, and different distances from the Gulf of Mexico.

[41] The GEV distribution was also applied here for further comparison with the ENT4 distribution. For each duration (15-minute, 45-minute, 1-hour, 12-hour, 1-day, 7-day, and 30-day), a total of 10 data sets were used in each climate zone (except that for the SA climate zone, where six data sets were used for 15-minute and 45-minute duration due to data limitation). The number of stations (and percentage) that ENT4 performed better than GEV in different climate zones is shown in Table 6. Taking the result in the CS climate zone as an example, the ENT4 distribution performed better for all durations for at least 8 out of 10 datasets (or ≥80%).

Table 6. Number of Stations (and Percentage) With Better Performance From ENT4 Distribution for Different Climate Zones and Durations
DurationCSSAaSH
NumberPercentage (%)NumberPercentage (%)NumberPercentage (%)
  1. aFor 15- and 45-minute data of SA climate zone, only six stations are selected due to data limitation.
15-minute99046710100
45-minute101006100990
1-hour99099010100
12-hour99010100880
1-day880770990
7-day880880770
30-day880880770

[42] The ENT4 distribution was also compared with the GEV distribution for different distances from the Gulf (groups I and II) with a total of 10 data sets in each group. There were not enough stations with a relatively long record of 15-minute data in group I and thus only the hourly (1- and 12-hour) and daily data (1-, 7-, and 30-day) were used. The number of stations (and percentage) that ENT4 performed better than GEV for the two groups is shown in Table 7. It can be seen that generally the ENT4 distribution performed better than the GEV distribution. Taking the 1-hour data as an example, the ENT4 distribution had less RMSE for 10 cases (or 100%) for group I and eight cases (or 80%) for group II, respectively.

Table 7. Number of Stations (and Percentage) With Better Performance From ENT4 Distribution for Different Distances and Durations
DurationGroup IGroup II
NumberPercentage (%)NumberPercentage (%)
1-hour10100880
12-hour550990
1-day990990
7-day990990
30-day10100990

[43] An IDF curve is defined as a relationship of rainfall intensity occurring over a certain duration, d, with different return periods. The annual rainfall maxima distribution can then be employed for the construction of IDF curves [Singh, 1992], which can be utilized for hydraulic design, such as storm sewers and parking lots, etc. The hourly annual rainfall data for station 418583 were used to construct the IDF curves, as shown in Figure 7. The empirical return period (TE) was obtained from the Gringorten plotting position formula as TE = 1/(1 – P), where P is the nonexceedance probability. The empirical return periods were also plotted on the IDF curves. (The theoretical return period was extrapolated around the highest empirical return period to avoid large errors.) Note that the accuracy of the empirical return period for the highest-ranked peak flows is limited [Stedinger et al., 1993; Beckers and Alila, 2004]. Generally, the return period from the IDF curves fitted the empirical return period well. For example, for the return period 12.2 years of 1-hour duration, the theoretical rainfall quantile from the ENT4 distribution was 2.6 inches, while the observed quantile was 2.4 inches.

Figure 7.

IDF curves of different durations for station 418583.

6 Conclusions

[44] Frequency characteristics of annual rainfall maxima from different stations in Texas are analyzed.

[45] Results show that frequency distributions of annual rainfall maxima are highly skewed for short durations, such as 15-minute, and tend to be smoothed when the duration is relatively long. The distributions also show different patterns across different climate zones. In northern and western parts, such as the CS and SA climate zones, distributions are sharp; however, they are relatively smooth in the southeast, such as the SH climate zone. The possible reason is that in the CS and SA climate zones, heavy rainfall is mainly produced by thunderstorms, while in the SH climate zone the moisture from the Gulf of Mexico is the moderating factor. For the other climate zones, no clear pattern is found, which may be due to the mixed effect of different rainfall mechanisms. The frequency distribution of rainfall near the Gulf of Mexico is smoother than that far away from the Gulf. The reason may be that the Gulf of Mexico serves as the moisture source.

[46] An entropy-based distribution is proposed for frequency analysis of annual rainfall maxima. Monte Carlo simulation based on the synthetic data from different distributions shows that generally the ENT4 distribution is comparable with the GEV distribution and is preferable for the data sets with high skewness. Furthermore, the ENT4 distribution performs better for most cases than the GEV distribution in modeling the quantiles based on the observed rainfall data. These results from the synthetic data and observed rainfall data show that the ENT4 distribution is a good candidate to model the annual rainfall maxima. The ENT4 distribution is then applied to the frequency distribution of annual rainfall maxima of different durations, climate zones, and distances from the Gulf, and further comparison between the ENT4 and GEV distributions shows that the ENT4 distribution performs well in modeling extreme rainfall. Analysis of the changing patterns of rainfall distribution with the duration, climate zone, and distance from the Gulf of Mexico sheds some light on the analysis of rainfall of different durations in Texas.

Acknowledgments

[47] This work was financially supported, in part, by the United States Geological Survey (USGS; Project ID: 2009TX334G) and TWRI through the project “Hydrological Drought Characterization for Texas under Climate Change, with Implications for Water Resources Planning and Management,” and, in part, by the National Research Foundation Grant funded by the Korean Government (MEST) (NRF-2009-220-D00104).

Ancillary