SEARCH

SEARCH BY CITATION

Keywords:

  • Count data regression;
  • generalized additive model (GAM);
  • hurricane;
  • power system reliability

Abstract

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. BACKGROUND
  5. 3. DESCRIPTION OF DATA
  6. 4. RESULTS
  7. 5. CONCLUSION
  8. ACKNOWLEDGMENTS
  9. REFERENCES

Electric power is a critical infrastructure service after hurricanes, and rapid restoration of electric power is important in order to minimize losses in the impacted areas. However, rapid restoration of electric power after a hurricane depends on obtaining the necessary resources, primarily repair crews and materials, before the hurricane makes landfall and then appropriately deploying these resources as soon as possible after the hurricane. This, in turn, depends on having sound estimates of both the overall severity of the storm and the relative risk of power outages in different areas. Past studies have developed statistical, regression-based approaches for estimating the number of power outages in advance of an approaching hurricane. However, these approaches have either not been applicable for future events or have had lower predictive accuracy than desired. This article shows that a different type of regression model, a generalized additive model (GAM), can outperform the types of models used previously. This is done by developing and validating a GAM based on power outage data during past hurricanes in the Gulf Coast region and comparing the results from this model to the previously used generalized linear models.

1. INTRODUCTION

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. BACKGROUND
  5. 3. DESCRIPTION OF DATA
  6. 4. RESULTS
  7. 5. CONCLUSION
  8. ACKNOWLEDGMENTS
  9. REFERENCES

Hurricanes have caused severe damage to the electric power system throughout the Gulf Coast region of the United States, and electric power is critical to posthurricane disaster response as well as to long-term recovery for impacted areas. Power outage estimates form the basis of prestorm decisions that utility companies must make about the number of crews to request through mutual aid agreements and the locations at which crews and materials should be staged in preparation for a recovery effort. Requesting extra crews is costly, and placing them as near as possible to the locations where the worst damage will occur is critical to the rapid restoration of electric power. Managing power outage risk and properly preparing for poststorm recovery efforts requires rigorous methods for estimating the number and location of power outages before a storm makes landfall. These estimates must be geographically detailed and accurate while also accounting for the complicated relationships between a number of possible explanatory variables and power outages.

Previous research in estimating power outages during hurricanes has developed regression models that are fit based on geographically detailed records of power outages during past hurricanes and then used to predict where power outages are most likely during future hurricanes. The general process for using such models in practice for an approaching hurricane consists of the following steps:

  • 1
    Utilize hurricane track, intensity forecasts, and reconnaissance data from the National Hurricane Center (NHC) or other sources to formulate a small number of scenarios, each consisting of a different estimate of where the hurricane is going, how strong it will be, and what its physical characteristics such as central pressure difference and size will be.
  • 2
    Run a hurricane wind field simulation model for each of the model scenarios selected in the first step to provide a forecast of the time-varying wind field over the area of concern. This is an input to the outage forecasting model.
  • 3
    For some of the past models, estimate prestorm soil moisture levels and prestorm deviation of precipitation from long-term average precipitation. This is an input to the outage forecasting model.
  • 4
    Gather estimates of basic hurricane properties at landfall––radius of maximum winds, central pressure difference, and time since the last hurricane landfall.
  • 5
    Use the regression model to forecast outages with the collected data in each of a large number of small geographic units covering the utility's service area.
  • 6
    Summarize and plot the outages to support decision making.

Previous research in this area(1–3) has used a particular form of regression model, a generalized linear model (GLM), that assumes a linear relationship between the available explanatory variables and log(y) where y is the number of power outages. However, these models have been shown to substantially overestimate the number of power outages in urban areas in some cases(3) and they do not account for nonlinearity in the relationships between the elements of X and the response variable.

This article presents an innovative use of a generalized additive model (GAM) to reanalyze a data set consisting of power outages in 6,681 grid cells each of size 3.66 km (12,000 feet) by 2.44 km (8,000 feet) during five hurricanes in the service area of a large, investor-owned utility company. This is the same data set analyzed in Han et al. (2009). This reanalysis shows that GAMs can provide more accurate predictions of the number of power outages in each geographic area of a utility company's service area and a better understanding of the response of the system than GLMs do, at least for the data set analyzed in this article. The explanatory variables used in the regression model include information about the (1) winds experienced in each grid cell during each hurricane, (2) long-term precipitation and soil moisture levels in each grid cell at the time of the hurricane, (3) power system components in each grid cell, and (4) land use and land cover in each grid cell. The results have important implications for risk analysis for other types of infrastructure systems where regression models are becoming more commonly used in the risk analysis process.

2. BACKGROUND

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. BACKGROUND
  5. 3. DESCRIPTION OF DATA
  6. 4. RESULTS
  7. 5. CONCLUSION
  8. ACKNOWLEDGMENTS
  9. REFERENCES

One common approach in developing models for predicting infrastructure performance based on past data is to use statistical regression models.(1,3–6) While some suggest the use of linear regression models,(5) these models are inappropriate for count data. The assumed conditional distribution for the data is not correct, and the standard assumptions about the distributional form of the residuals do not hold. For example, in an ordinary linear least squares regression model the errors are assumed to be homoscedastic while with count data, the magnitude of the error often increases with the magnitude of the observed value. GLMs and GAMs provide a better basis for regression modeling of count data.(7,8) In this section, we provide an overview of these two classes of models. The discussion of GLMs in this section follows that of Han et al. (2009)(3) closely to aid in comparisons.

2.1. Generalized Linear Models

A GLM consists of three components. The first component, the random component, specifies the behavior of the response variable for a fixed set of predicting variables by allowing the use of any distribution from the exponential family as given by Equation (1):

  • image(1)

where inline image is the natural parameter (e.g., a function of the mean), inline image is the scale parameter related to the dispersion of the distribution, y is the response variable, and a() and c() are functions determined by the particular probability mass function being utilized.(9) The function b() plays a role in determining the relationship between the mean of the distribution and the observed data. The second component, the systematic component, of a GLM specifies the predicting variables and the relationship between them and the link parameter η as shown in Equation (2) where xi is one element of the matrix x, and βi is the regression coefficient corresponding to xi.

  • image(2)

The final component links the systematic and random components. For example, the mean inline image may be related to the natural parameter inline image by inline image where b() is a predefined function and inline image is its inverse function.(9) This link component generally is of the form:

  • image(3)

where g is referred to as the link function. A simple example of a GLM is the Poisson GLM given by Equations (4) and (5) below.

  • image(4)
  • image(5)

GLMs have been used for modeling power outages during hurricanes.(1,3) Liu et al. (2005)(2) also used a generalized linear mixed model (GLMM) to examine the importance of spatial correlation in statistical power outage estimation models. We ignored the spatial correlation in the data beyond that captured by the covariates in this article in order to focus on the difference due to the change from the GLM framework to the GAM framework. Reexamining the possible importance of spatial correlation within the GAM framework is one possible future extension of the current work. While GLMs and GLMMs can provide useful predictions of power outages during hurricanes, the assumption that the systematic component is linear in the explanatory variables limits their ability to fit complex data sets such as those relating to power outages during hurricanes. In such cases, the relationship in the link function may be highly nonlinear. GAMs provide a flexible method for incorporating nonlinear link functions.

2.2. Generalized Additive Models

A GAM is composed of a random component, an additive component, and a link function. A GAM is different from a GLM in that an additive predictor replaces the linear predictor. That is, the linear form inline image is replaced with the additive form inline image where fi(xi) is a function that smoothes the jth component of x. More specifically, a GAM generally assumes that the response y has a distribution with the mean μ= E[YX1, … ,Xp] linked to the predictor via a link function:

  • image(6)

where each inline image is a smoothing function of a specified class of functions estimated nonparametrically.(9) Common classes of smoothing functions are regression splines (often cubic) and tensor product splines. While the nonparametric form of inline image makes the model more flexible, additivity is retained and allows one to fit the model in much the same way as GLMs. This approach allows the form of the relationship between the explanatory variables and the measure of interest, here power outages during hurricanes, to be estimated directly from the data.

3. DESCRIPTION OF DATA

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. BACKGROUND
  5. 3. DESCRIPTION OF DATA
  6. 4. RESULTS
  7. 5. CONCLUSION
  8. ACKNOWLEDGMENTS
  9. REFERENCES

The occurrence of power outages during hurricanes depends on a number of factors that influence the vulnerability of electric power systems to outages. Some of these factors include power system exposure measures such as the length of distribution line and the number of poles, switches, transformers, and customers in different geographic areas. Other factors include geographic information such as land use and land cover data and climatic factors such as the duration and intensity of the hurricane, long-term precipitation patterns in the area, and soil moisture levels before a hurricane makes landfall. Soil moisture, in particular, may be important to help account for the stability of the foundation of utility poles and trees that may fall onto power lines and poles.

Our model is based on data provided by a large, investor-owned utility company in the central Gulf Coast region, and it is the same data set analyzed in Han et al. (2009).(3) Here we provide a brief overview of the data used in a manner that closely follows the more complete description of the data given in Han et al. (2009).(3)

The service area of this utility company was covered with the 6,681 [3.66 km (12,000 feet) by 2.44 km (8,000 feet)] grid cells mentioned above and these grid cells form the unit of analysis in our work. We used data on outages in the service area during five hurricanes: Danny (1997, 627 outages), Dennis (2005, 4,840 outages), Georges (1998, 1,705 outages), Ivan (2004, 13,568 outages), and Katrina (2005, 10,105 outages). Although this was all of the past outage data that were available, it is a larger data set than is available from many utilities. These outage data were combined with information about the power system, geographic characteristics of the service area, nonhurricane climatic data, and hurricane characteristics from a hurricane wind field simulation model and publicly available hurricane data. While the model developed based on these data is not directly applicable to other areas, it does provide insights into the benefits of a GAM-based approach over a GLM-based approach as well as general insights into the relative influence of the different explanatory variables in the Gulf Coast region of the United States.

3.1. Hurricane Characteristic Data

In order to capture the characteristics of the wind field during a hurricane, we used estimates of the maximum three-second gust wind speed and the length of time that the winds were above 20 m/s (44.7 miles per hour) for each grid cell based on the hurricane wind field model developed by Huang et al. (2001).(10) In this hurricane model, reconnaissance flight data are used to develop an estimate of the gradient wind field based on the model of Georgiou(11) and the hurricane decay model of Vickery and Twisdale.(12) This model produces an estimate of the gradient wind speed throughout the duration of a hurricane at the center of each grid cell. This estimated wind speed was then converted to a “surface wind speed,” the wind speed estimated at a height of 10 m in an assumed open exposure location (e.g., an airport), by using a multiplicative gradient-to-surface conversion factor. The gradient-to-surface conversion factors used were 0.72 for sites more than 10 km from the coast, 0.80 for sites within 10 km from the coast, and 0.90 for sites adjacent to the ocean as suggested by Rosowsky et al. (1999).(13) This wind field model has been evaluated for six recent hurricanes (Fran, Bonnie, Dennis, Floyd, Isabel, and Charley)(14) and with model-predicted wind speeds generally showing strong agreement with measured wind speeds. This model has previously been applied to evaluate long-term wind risk in the southeastern United States(10,15) and to estimate the spatial distribution of hurricane-induced power outages in central Gulf Coast.(3)

Liu et al. (2005)(1) included hurricane indicator variables, binary variables in the regression model signifying which hurricane a given outage is from. However, these variables represent only past hurricanes and are, at best, difficult to use for predicting outages from future hurricanes. In Han et al. (2009),(3) we developed additional explanatory variables that represent hurricane characteristics beyond the wind speed. The three additional parameters that we found to be most useful in the predictive GLMs were the central pressure difference (in millibars) when each hurricane makes landfall, the time (in months) since the last hurricane landfall, and the radius (in kilometers) of maximum wind speed of each hurricane. Inclusion of these variables in the model allows the hurricane indicator variables to be eliminated without any decrease in the goodness of fit to the data when using a negative binomial GLM.(3) We build from these results in this article and present models based on the three measurable hurricane characteristics, not models based on the indicator variables of Liu et al. (2005).(1)

3.2. Land Cover

We also used information about land cover and land use to capture differences in outage rates for different land uses. The land cover data we used are publicly available from the National Land Cover Database 2001(16) and consists of data with a resolution of 1 arc-second (approximately 30 m) for each of 21 land cover classes. We categorized the 21 land cover classes into eight aggregated classes. This yielded eight coherent land cover types: water, developed (including residential, commercial, and industrial), barren, forest, scrub, grass, pasture, and wetland.

3.3. Nonhurricane Climatic Data

As in Han et al. (2009),(3) we also used information about soil moisture, antecedent precipitation, and mean annual precipitation to assist in explaining the variability of outages. Soil moisture and antecedent precipitation were included because extremely wet conditions are thought to increase the likelihood of poles and trees being blown down during hurricanes. Conversely, extremely dry conditions may make trees more susceptible to snapping. Soil moisture was simulated at 1/2 degree (latitude/longitude) resolution using the variable infiltration capacity (VIC) model.(17–19) Soil characteristics were extracted from the State Soil Geographic (STATSGO) database created by the Natural Resource Conservation Service. The STATSGO database was designed for broad planning and management uses that cover state, regional, and multistate areas. It was produced by generalizing the detailed soil survey data to a mapping scale of 1:250,000. The number of soil polygons per quadrangle map ranges from 100 to 400 and the minimum area mapped is roughly 625 hectares. Detailed investigations of model performance have demonstrated that VIC can accurately simulate the wetting and drying of the soil.(20,21) Antecedent precipitation prior to hurricane landfall was quantified using the Standardized Precipitation Index (SPI).(22,23) The SPI is a statistical measure of the deviation of precipitation from normal conditions and it can be calculated for any time period of interest. The SPI was calculated for six different time periods (1, 2, 3, 6, 12, and 24 months) using monthly precipitation data (1915–2005) at 1/2 degree (latitude/longitude) resolution.

Mean annual precipitation and potential evapotranspiration are related to the types of natural vegetation that tend to grow in an area.(24) Since some types of trees, such as pines, may be more susceptible to being blown onto power lines during a hurricane, it is important to try and account for spatial variations in vegetation. Since detailed vegetation data are not available on a state-wide basis, we are using mean annual precipitation to represent the spatial variability in the distribution of plant communities. Mean annual precipitation (millimeter) was calculated at 1/2 degree (latitude/longitude) resolution using daily precipitation acquired from the National Oceanic and Atmospheric Administration Cooperative Observer network data (1915–2004). Of course, other factors such as soil fertility and human activities strongly influence the distribution vegetation and so mean annual precipitation is only a proxy variable that accounts for the broad patterns in vegetation. The soil moisture, SPI, and mean annual precipitation data were all downscaled to the utility company grid using an inverse-distance weighting algorithm (radius of influence = 100 km).

3.4. Power System Data

In addition to the information discussed above, we included information about the power system obtained from the utility company. This includes the number of transformers, poles, switches, and customers and the length of overhead line in each grid cell. This information was included to provide a measure of the exposure of the system to high winds during hurricanes, with the hypothesis being that more system elements and more overhead line in a grid cell would be associated with higher numbers of power outages.

We also obtained the number of power outages in each grid cell during each of the hurricanes, our response variable, from the utility company. A power outage is defined as the activation of a protective device, and it includes only prolonged loss of power. That is, the utility company's definition of an outage excluded very short duration outages that were automatically cleared (restored) by protective devices within the power system. A single outage can affect varying numbers of customers.

3.5. Principal Components Transformation

A common problem encountered when fitting regression models to large data sets concerning the performance of infrastructure systems during disasters is that the covariates may be highly correlated, making it difficult to draw inferences based on the estimated regression parameters (for a GLM) or functional forms (for a GAM). High degrees of collinearity can lead to misestimation of standard errors for regression coefficients, leading to inaccurate parameter p-values. It can also lead to problems with the predictive accuracy of the fitted model for future data sets. There are two main approaches for overcoming this difficulty: (1) changing the model used to provide stable estimates despite the collinearity in the data or (2) transforming the data to remove correlation problems. The data we used did have a high degree of collinearity, and we used the same principal components analysis (PCA) transformation of the data as in Han et al. (2009).(3) The PCA was done by a singular value decomposition of the complete (not segmented) standardized data to obtain PCs for the covariance matrix. While it is possible to use PCA for data reduction, we used all 26 of the transformed covariates to assure no loss of explanatory information in the regression model. Note, however, that basing our regression analysis on PCA-transformed variables precludes direct inferences about variable importance, particularly with a GAM where the transformed variables are further transformed through the use of smoothing splines.

4. RESULTS

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. BACKGROUND
  5. 3. DESCRIPTION OF DATA
  6. 4. RESULTS
  7. 5. CONCLUSION
  8. ACKNOWLEDGMENTS
  9. REFERENCES

4.1. Summary of Previous GLM Results

Figs. 1 and 2 show the actual number of outages during Hurricane Katrina and the number of power outages predicted with the GLM from Han et al. (2009).(3) The model predictions are reasonably accurate, but the model suffers from overprediction in the main urban areas combined with underprediction in the rural areas. Examination of the results suggested that this problem of underpredicting rural areas and overpredicting urban areas was likely due to a lack of linearity in the relationship between outages and the miles of overhead line in each grid cell. A formal analysis of these results, including model fit analysis and prediction error analysis based on hold-out sample validation, is given in Han et al. (2009).(3) The difference in the geographic pattern of the predictions is troubling because one of the two main intended uses of the model is to help the utility company guide the allocation of repair crews between different geographic areas based on relative differences in the predicted number of outages.

Figure 1. Actual number of outages during Hurricane Katrina.

Download figure to PowerPoint

image

Figure 2. Number of outages predicted with the GLM of Han et al. (2009)(3) for Hurricane Katrina.

Download figure to PowerPoint

image

4.2. GAM Fitting Process

In an effort to capture nonlinearity in the link function and to provide better predictions of power outages during hurricanes, we fit negative binomial GAMs to the data described above using the program R with cubic regression splines for our smoothing functions.(25)Fig. 3 shows the fitted splines for the first four principal components, showing nonlinearity in the relationship between these principal components and the log of the mean number of power outages. The tick marks on the x-axis represent the locations of the data points in the dimension of the x-axis, solid curves are the function estimates, and dashed curves are the 95% confidence intervals for each function. For example, the first subplot indicates considerable nonlinearity in the relationship between the first principal component and the log of the mean number of power outages. In contrast, GLMs such as those developed by Liu et al. (2005)(1) and Han et al. (2009)(3) assume a linear relationship. In using cubic regression splines in the program R, every data point was considered to be a potential knot location. The smoothing parameters, one per spline, and the spline fits themselves are iteratively calculated in order to minimize the generalized cross-validated deviance of the fitted model.(26) We began with one single-term spline per explanatory variable and iteratively removed splines in order of decreasing p-value until we were left with only splines that were statistically significant at a 0.05 level before then testing the predictive accuracy of the models. Because we wanted to keep the models simple and to ease comparisons with Han et al. (2009)(3) where interactions among covariates were not included, we did not consider higher order splines. We formally compared all of the models that were fit to the data on the basis of a deviance-based generalized cross-validation (GCV), selecting the model with the lowest GCV as the best fit to the data. The deviance-based GCV used in this study is defined as:

  • image(7)

where D is the deviance, inline image is the fitted value at the ith point, and R is the weighted additive-fit operator corresponding to the last iteration of the local-scoring procedure.(26) The deviance, D, of a fitted model is given by:

  • image(8)

where LLfitted is the log likelihood of the fitted model and LLfull is the log likelihood of a model with one parameter per observation, i.e., a model that fits perfectly.

Figure 3. Fitted additive splines for four principal components.

Download figure to PowerPoint

image

We also conducted repeated hold-out sampling of the data to test the predictive accuracy of the best-fitting GAM for hurricanes not included in the fitting data and to compare the predictive accuracy of this model with that of the best-fitting GLM from Han et al. (2009).(3) We divided the data into a fitting data set from which we removed one of the hurricanes and a validation set consisting of the data from the removed hurricane data. We fit a GAM to the fitting data set, repeated the variable selection process based on the fitting data, and then used the fitted model to predict the number of outages of each grid cell in the validation set. By repeating this process for each hurricane, we were able to estimate the predictive accuracy of the models for data not included in the fitting data set. Ideally we would have had data from a large number of hurricanes to use in the validation process, but this is not realistic. The utility company that provided our data had collected geographically detailed outage records only back to the mid-1990s, a much longer time frame of data collection than many utilities have. While not ideal, our five hold-out samples still provide a strong test of the predictive ability of the model for hurricanes not included in the fitting data set.

4.3. GAM Results

Table I gives the model fit diagnostics for the negative binomial GAMs. For comparison purposes, the best-fit negative binomial GLM from Han et al. (2009)(3) is also included in Table I, and it had a deviance of 18,884 on 33,379 degrees of freedom. In Table I, negative binomial GAM 0 represents the saturated model, the model with single-term splines of all PCA-transformed covariates included. Negative binomial GAM 5 includes only splines of the principal components with p-values below 0.05. We started with GAM 0 and iteratively fit reduced models. At each iteration we removed the principal component with the highest p-value and then refit the reduced model. It should be noted that if there are large correlations in the nonlinear smoothing functions included in a GAM, the standard errors can be underestimated,(27,28) leading to overstatement of parameter significance. However, with our data set we started with orthogonal principal components and used the p-values only for model selection, not parameter inference, reducing the issues associated with misestimated standard errors in GAMs.

Table I.  Comparison between NB GLM and NB GAMs; the GLM Results are from Han et al. (2009)(3)
ModelDevianceDegrees of FreedomAIC*inline image*GCVVariables Excluded**
  1. *AIC stands for Akaike information criterion and inline image stands for a pseudo R2 based on the overdispersion parameter α.

  2. **In the variable exclusion list, RMW stands for the radius of maximum winds and PC stands for principal component.

Negative binomial GLM18,88433,37953,1540.8424RMW, PC 9, 13, 26
Negative binomial GAM 015,27633,31149,3950.99901.0028None
Negative binomial GAM 115,27633,31149,3950.99901.0028PC 17
Negative binomial GAM 215,26633,31449,3980.99901.0027PC 17, 24
Negative binomial GAM 315,31233,31549,3820.99901.0027RMW, PC 17, 24
Negative binomial GAM 415,28033,31949,3950.99901.0026RMW, PC 11, 17, 24
Negative binomial GAM 515,28133,31949,3960.99901.0026RMW, PC 11, 17, 24, 26

From Table I we see that the deviance and Akaike information criterion (AIC)(29) for the negative binomial GAMs are lower than those for the best-fit negative binomial GLM, suggesting that the GAMs fit the data better than the best-fit GLM. In addition, Table I shows that for the GAM models, all values of inline image, a pseudo-R2 based on the overdispersion parameter α, are approximately one and are higher than the inline image values for the best-fit negative binomial GLM. This suggests that the GAM models are accounting for more, and in fact nearly all, of the overdispersion. The variability that remains in the predicted counts is primarily due to the Poisson variability about the mean. Another diagnostic for comparison of models is the GCV of the regression model. While lower AIC and deviance values are generally preferable, we selected the deviance-based GCV as our primary criteria in comparing the fits of different negative binomial GAMs because of its advantages in terms of invariance.(30) Based on AIC and GCV, negative binomial GAM 4 provides the best fit to the data set.

Fig. 4 shows the outage predictions from negative binomial GAM 4 for Hurricane Katrina. Comparing this map of predicted outages with the map of the actual number of outages (Fig. 1), we see that the GAM predictions match the spatial distribution of outages much more closely than the GLM predictions do. Similar results are seen for the other four hurricanes, though they are not displayed for the sake of brevity.

Figure 4. Number of outages predicted with the GAM for Hurricane Katrina.

Download figure to PowerPoint

image

As mentioned above, the negative binomial GLM of Han et al. (2009)(3) overestimated the number of outages substantially in some grid cells, and these overestimates influence the overall mean squared error for the GLM. In examining the grid cells corresponding to these outliers in detail, it was noticed that the grid cells were predominantly in areas with high densities of overhead line relative to other grid cells and that these seemed to be driving the overprediction for these areas. On the other hand, Fig. 5 shows that the predicted number of outages grows approximately linearly (with associated variability) with the actual number of outages and the range of predicted number of outages is very similar to the actual number of outages for the negative binomial GAM, suggesting that the GAM overcomes the overestimation problem. Again, similar results are seen for the other hurricanes, but these results are not shown here for the sake of brevity.

Figure 5. Predicted number of outages versus actual number of outages for the best-fit negative binomial GAM for Hurricane Katrina.

Download figure to PowerPoint

image

In order to check the predictive accuracy of the GAM, hold-out tests were performed for each hurricane and the averages of the absolute values of the difference between the actual number of outages and the predicted number of outages (referred to here as MAE for mean absolute error) were calculated. Table II shows the MAE for each hurricane for both the GLM and the GAM. We subdivided the MAE into four categories in terms of the actual number of outages in order to get a more complete picture of prediction accuracy for this model. The categorized MAE provides a measure of the prediction error for each outage range. For example, in comparing the GLM and GAM for Hurricane Katrina, the GAM outage prediction is, on average across the grid cells, approximately 2.6 times more accurate for grid cells with 0 to 1 outages, 4.77 times more accurate for grid cells with 10 to 50 outages, and a little better for grid cells over 50 outages. As discussed above, the GLM overestimates outages for some grid cells. In addition, the predictive accuracy of the GLM is highly variable across hurricanes. For Hurricane Dennis the MAE of the GLM for the 10 to 50 outage range is approximately 808 while for the GAM it is approximately 11. The GAM on the other hand provides consistently lower prediction errors than the GLM provides for all hurricanes. Overall, the results suggest that GAMs can provide much more accurate outage predictions than GLMs across a variety of types of hurricanes, including large, powerful hurricanes like Hurricanes Katrina and Ivan and smaller, weaker hurricanes like Hurricane Danny. While there is still error in the predictions, the results provide a much better basis for allocating repair crews among the different geographic portions of the service area.

Table II.  MAEs for Hold-Out Sampling Fitted by NB GLM and NB GAM; the GLM Results are from Han et al. (2009)(3)
  Danny (1997)Georges (1998)Ivan (2004)Dennis (2005)Katrina (2005)
Actual number of outages6271,07513,5684,84010,105
inline image0.09380.16092.03080.72441.5125
MAEGLM0 ∼ 1outages1.30.141.7×10−70.390.21
1 ∼ 10outages6.017.62.73.72.0
10 ∼ 50outages8.350.117.4808.475.8
>50outages72.257.3
MAEGAM0 ∼ 1outages0.090.103.8×10−90.380.08
 1 ∼ 10outages1.81.42.72.02.1
 10 ∼ 50outages5.413.817.410.615.9
 >50outages  72.2 51.5

5. CONCLUSION

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. BACKGROUND
  5. 3. DESCRIPTION OF DATA
  6. 4. RESULTS
  7. 5. CONCLUSION
  8. ACKNOWLEDGMENTS
  9. REFERENCES

This study has improved the accuracy of models for estimating the spatial distribution of power outages during an approaching hurricane through the use of GAMs. This will in turn help utility companies improve their posthurricane response through improved prehurricane allocation of repair crews to different portions of the service area. This article has also demonstrated that large, state-wide data sets concerning the performance of power systems during past hurricanes can be used to accurately estimate the number and spatial distribution of power outages during future hurricanes. Furthermore, it has shown that semiparametric GAMs can provide substantially improved accuracy in power outage estimates relative to GLMs. This work can provide a basis for improving prehurricane planning for posthurricane response, and it can provide a basis for future research to further improve power outage estimation models for hurricane-prone areas.

ACKNOWLEDGMENTS

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. BACKGROUND
  5. 3. DESCRIPTION OF DATA
  6. 4. RESULTS
  7. 5. CONCLUSION
  8. ACKNOWLEDGMENTS
  9. REFERENCES

This work was partially funded by a private utility company that wishes to remain anonymous. This utility also provided the data used in the analysis. We gratefully acknowledge its support. All opinions expressed in this article are those of the authors and do not necessarily reflect the positions of this sponsor.

REFERENCES

  1. Top of page
  2. Abstract
  3. 1. INTRODUCTION
  4. 2. BACKGROUND
  5. 3. DESCRIPTION OF DATA
  6. 4. RESULTS
  7. 5. CONCLUSION
  8. ACKNOWLEDGMENTS
  9. REFERENCES
  • 1
    Liu H, Davidson RA, Rosowsky DV, Stedinger JR. Negative binomial regression of electric power outages in hurricanes. Journal of Infrastructure Systems, 2005; 11(4):258267.
  • 2
    Liu H, Davidson RA, Apanasovich TV. Spatial generalized linear mixed models of electric power outages due to hurricanes and ice storms. Reliability Engineering and System Safety, 2008; 93(6):897912.
  • 3
    Han SR, Guikema S, Quiring S, Lee KH, Davidson R, Rosowsky D. Estimating the spatial distribution of power outages during hurricanes in the Gulf Coast region. Reliability Engineering and System Safety, 2009; 94(2):199210.
  • 4
    Ariaratnam ST, El-Assaly A, Yang Y. Assessment of infrastructure inspection needs using logistic models. Journal of Infrastructure Systems, 2001; 7(4):160165.
  • 5
    Kleiner Y, Rajani BB. Comprehensive review of structural deterioration of water mains: Statistical models. Urban Water, 2001; 3(3):131150.
  • 6
    Guikema SD, Davidson RA, Liu H. Statistical models of the effects of tree trimming on power system outages. IEEE Transactions on Power Delivery, 2006; 21(3):15491557.
  • 7
    Guikema SD, Coffelt JP. Modeling count data for non-linear, complex infrastructure systems. Journal of Infrastructure Systems, in press.
  • 8
    Guikema SD. Natural disaster risk analysis for critical infrastructure systems: An approach based on statistical learning theory. Reliability Engineering and System Safety, 2009; 95:855860.
  • 9
    Hastie T, Tibshirani R. Generalized additive models. Statistical Science, 1986; 1(3):297310.
  • 10
    Huang Z, Rosowsky DV, Sparks PR. Hurricane simulation techniques for the evaluation of wind speeds and expected insurance losses. Journal of Wind Engineering and Industrial Aerodynamics, 2001; 89:605617.
  • 11
    Georgiou PN. Design wind speeds in tropical cyclone-prone regions. Ph.D. dissertation, University of Western Ontario . 1995.
  • 12
    Vickery PJ, Twisdale LA. Wind-field and filling models for hurricane wind-speed prediction. Journal of Structural Engineering, 1995; 121(11):17001709.
  • 13
    Rosowsky D, Sparks P, Huang Z. Wind Field Modeling and Hurricane Hazard Analysis. Report to the South Carolina Grant Consortium and Civil Engineering Department, Clemson University, Civil Engineering Department, 1999.
  • 14
    Lee KH, Rosowsky DV. Synthetic hurricane wind speed records: Development of a database for hazard analyses and risk studies. Natural Hazards Review, 2007; 8:2334.
  • 15
    Huang Z, Rosowsky DV, Sparks PR. Long-term hurricane risk assessment and expected damage to residential structures. Reliability Engineering & System Safety, 2001; 74:239249.
  • 16
    U.S. Geologic Survey Landcover Institute. Available at http://landcover.usgs.gov/landcoverdata.php, Accessed on June 23, 2009.
  • 17
    Liang X, Lettenmaier DP, Wood EF, Burges SJ. A simple hydrologically based model of land surface water and energy fluxes for GCMs. Journal of Geophysical Research, 1994; 99(14):415414, 428.
  • 18
    Liang X, Lettenmaier DP, Wood EF. One-dimensional statistical dynamical representation of subgrid spatial variability of precipitation in the two-layer variable infiltration capacity model. Journal of Geophysical Research, 1996a; 101(21);403422.
  • 19
    Liang X, Lettenmaier DP, Wood E. Surface soil moisture parameterization of the VIC-2L model: Evaluation and modifications. Global and Planetary Change, 1996b; 13:195206.
  • 20
    Meng L, Quiring SM. A comparison of soil moisture models using soil climate analysis network (SCAN) observations. Journal of Hydrometeorology, 2008; 9:641659.
  • 21
    Robock A, Luo L, Wood EF, Wen F, Mitchell KE, Houser PR, Schaake JC, Lohmann D, Cosgrove B, Sheffield J, Duan W, Higgins RW, Pinker RT, Tarpley JD, Basara JB, Crawford KC. Evaluation of North American land data assimilation system over the southern Great Plains during the warm season. Journal of Geophysical Research, 2003; 108:88468866.
  • 22
    McKee TB, Doesken NJ, Kleist J. The relationship of drought frequency and duration to time scales. International 8th Conference on Applied Climatology, 1993:179184.
  • 23
    McKee TB, Doesken NJ, Kleist J. Drought monitoring with multiple time scales. International 9th Conference on Applied Climatology, 1995:233236.
  • 24
    Mather JR. The Climatic Water Budget in Environmental Analysis. Farnborough, Hants , UK : Teakfield/Lexington Books, 1978.
  • 25
    Wood SN. Generalized Additive Models: An Introduction with R. Boca Raton , FL : CRC Press, 2006.
  • 26
    Hastie TJ, Tibshirani RJ. Generalized Additive Models. Boca Raton , FL : Chapman and Hall/CRC, 1990.
  • 27
    Ramsay T, Burnett R, Krewski D. The effect of concurvity in generalized additive models in time series studies of air pollution and health. American Journal of Epidemiology, 2003; 14(1):1823.
  • 28
    Samet JM, Dominici F, McCermott A, Zeger SL. New problems for an old design: The series analyses of air pollution and health. Epidemiology, 2003; 14(1)1112.
  • 29
    Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 1974; 19(6):716723.
  • 30
    Wahba G. Spline Models for Observational Data. Philadelphia , PA : SIAM, 1990.