[1] Large storms make it difficult to extract the longterm trend of erosion or accretion from shoreline position data. Here we make storms part of the shoreline change model by means of a storm function. The data determine storm amplitudes and the rate at which the shoreline recovers from storms. Historical shoreline data are temporally sparse, and inclusion of all storms in one model overfits the data, but a probabilityweighted average model shows effects from all storms, illustrating how model averaging incorporates information from good models that might otherwise have been discarded as unparsimonious. Data from Cotton Patch Hill, DE, yield a longterm shoreline loss rate of 0.49 ± 0.01 m/yr, about 16% less than published estimates. A minimum loss rate of 0.34 ± 0.01 m/yr is given by a model containing the 1929, 1962 and 1992 storms.
[2] Shoreline change models use historical shorelines to estimate the rate of change, or to predict a future shoreline, as in Figure 1a. Storm influenced shorelines raise difficult questions in this regard because shorelines tend to recover from storms [Birkemeier, 1979; Kriebel, 1987; Morton, 1988; Morton et al., 1994]. After a storm, shoreline change rate may return to its longterm trend [Galgano and Douglas, 2000; Zhang et al., 2002] over 5–15 years, depending on the magnitude of the storm. Should one therefore remove storminfluenced shorelines from the data when attempting to estimate the longterm trend? If so, how can one tell whether a shoreline is truly storm influenced? Here we address such questions by explicitly modeling the transient part of the storminduced change. The stormdriven permanent change is implicitly modeled as part of the long term trend.
[3] In modeling longterm shoreline change, data contaminated by large storms violate the leastsquares assumption that errors are normally distributed. One alternative to least squares is to objectively edit storm effects from the data using methods such as least median of squares (LMS), sometimes referred to as reweighted least squares (RLS) [Rousseuw, 1984; Rousseuw and Leroy, 1987; Genz et al., 2007]. Another alternative is to leave storms in the data and fit the model by minimizing absolute differences rather than squared differences (LAD) [Rousseuw and Leroy, 1987; Genz et al., 2007]. Douglas and Crowell [2000] removed storm points from the data until the misfit (average residual) was comparable to the standard error of the measurements plus 20% of beach width. They illustrated their procedure using data from Cotton Patch Hill, Delaware, which was hit by large storms in 1929, 1962, 1991 and 1992. We illustrate our procedure with the same Cotton Patch Hill data analyzed by Douglas et al. [1998] and Douglas and Crowell [2000, Table 1]. The 1991 and 1992 storms occurred between surveys in 1990 and 1993. The 1991 storm was smaller than the 1992 storm, and we model the effects of these two storms using a single storm function with onset in 1992.
2. OneDimensional Models With Storms
[4] A storm is defined here as any event that changes the position of the shoreline suddenly, with subsequent slow recovery toward the long term trajectory. Although seasonal changes in shoreline can be rapid, recovery usually occurs within a time much shorter than the time between historical shoreline surveys, so we regard uncorrected seasonal effects as part of the noise.
[5] The traditional 1D model for shoreline change is
in which y is the crossshore coordinate (shoreline location), b is the intercept, r is the longterm shoreline change rate, and n is uncorrelated noise with zero mean. The inverse problem for shoreline change is to infer the change rate from a time series of shoreline data y(t_{1}), y(t_{2}),…, y(t_{J}). The intercept depends on the baseline used to measure y, and on the time origin, both of which can be adjusted to condition the solution of the inverse problem. In mathematical parlance, the linear 1D model is a linear sum of the basis functions 1 and t. If acceleration is included in the model, there is a third basis function at^{2}, and Fenster et al. [1993] showed that models with acceleration can be more parsimonious than simple linear models. However, Crowell et al. [1997] showed that the acceleration term is good at fitting noise and that its inclusion in a model can lead to inaccurate predictions. This situation is a reminder of the importance of prior information in any inversion problem and that parsimony by itself does not select a good model unless candidate models are suitably chosen.
[6] Suppose that the data area is struck by a storm at time t_{s}. We use a basis function called the storm function, given by
in which γ ≥ 0 is the recovery rate (the inverse time scale of recovery). The augmented model for shoreline change is now
in which s is the storm amplitude parameter. Since the time of the storm is known, model (3) has two more parameters than model (1). It is linear in b, r and s, but nonlinear in γ. The fact that the storm function is discontinuous at t_{s} does not make it more difficult to use, because basis functions are not required to be continuous, only independent of each other. Versions of (3) implemented with different storms or combinations of storms are regarded as different models, and we use an information criterion (below) to rate the relative goodness of models. Most historical data sets contain only 6–10 shorelines, and the information criterion usually excludes models in which each storm has its own recovery rate; therefore we use the same recovery rate parameter for all storms.
[7] If some storminduced shoreline change is permanent, an appropriate model would be (3) plus a function s_{p}H(t − t_{s}) in which H is the unit step function and s_{p} is the amplitude of the permanent change. In temporally sparse historical data with multiple storms we find that such step functions can trade off with both our storm function and the rate function to such an extent that a model with only step functions fits the data fairly well. Admitting models consisting only of step functions replaces the problem of estimating a longterm rate parameter with the problem of estimating frequencies and amplitudes of storms. Moreover, models consisting only of step functions ignore the abundant evidence that stormaltered shorelines do recover to a large extent. Accordingly, for historical data we explicitly model only the transient part of the storm, leaving the permanent part as a component of longterm trend. Although it is not needed for this paper, beach nourishment can be modeled like a storm. For nourishment that alters a shoreline the storm function is used, but for offshore nourishment, which does not immediately alter a shoreline, we use the function − where t_{n} is the time of nourishment.
3. Linearization
[8] As the model is nonlinear in γ, we find γ by maximizing the profile likelihood [e.g., Coles, 2001, p. 34]. In order to include the effects of uncertainty in γ we then linearize the model in a neighborhood of the maximum likelihood estimate (MLE) . Since recovery rate is necessarily positive, and our noise model is Gaussian, we use μ = ln γ − ln as a parameter in the linearized model. As ∂e^{γt}/∂lnγ = γte^{γt}, the linearized model is
in which s_{k} is the amplitude of the k^{th} storm, K_{s} is the number of storms, and the coefficient of μ is a single basis function.
4. Noise and Recovery Rate
[9] We model the noise n(t) as a zeromean Gaussian. We assume the noise consists of observational noise (measurement noise) and process noise, and that the two noise processes are unrelated. Observational noise is estimated prior to modeling [Crowell et al., 1991; Douglas and Crowell, 2000; Fletcher et al., 2003; Genz et al., 2007]. Douglas and Crowell [2000] estimated the uncertainty in the high water line at Cotton Patch Hill as 6.5m, for a process noise variance of (6.5 m)^{2}, but here we estimate process noise from the data. Our data covariance matrix has the form
in which C_{o} is the covariance matrix of measurement noise, C_{p} is an unscaled covariance matrix of process noise, and η is a scaling parameter to be estimated from the data. We assume that observational noise at one time is uncorrelated with observational noise at other times, so C_{o} is diagonal. Observational error ranged from 2.6 m in 1997 to 8.9 m in 1845 (Table A1 of section A in Text S1 of the auxiliary material).
[10] Process noise should be correlated, as white noise convolved with a storm function gives a covariance matrix C_{p}(i, j) = (γ/2)exp(−γ∣t_{i} − t_{j}∣), but our experiments suggest that γ is poorly resolved by the residuals in historical data sets because the data are too sparse. In the numerical calculations presented here we take C_{p} to be the identity matrix (as did Douglas and Crowell [2000]). Table 1 gives the process error for the models of this paper. The bestfit model (R,S62) has a process error of 11.2 m that is roughly 35% of the beach width. The threestorm model with the 0.1 m process error overfits the data.
Table 1. Process Error for Models
Model
Process Error (m)
Averaged model
12.5
R
30.9
R, S29
30.4
R, S62
11.2
R, S92
30.2
R, S29, S62
8.2
R, S29, S92
29.9
R, S62, S92
9.8
R, S29, S62, S92
0.1
R (no poststorm data)
9.5
[11] Our likelihood function is the usual Gaussian,
in which G is the system matrix (design matrix, configuration matrix), and m is the column vector of model coefficients. For the model of equation (4), the parameter vector is m = [b, r, s_{1}, s_{2},…, μ]^{T}, and the columns of G are the basis functions evaluated at each survey time. The first column of G is all ones, and the second column is [t_{1}, t_{2},…, t_{N}]^{T}, and we condition G by removing the mean from all columns after the first. Maximizing the likelihood with respect to the parameter vector m gives the usual relation
and maximizing the likelihood with respect to the noise parameter η gives the nonlinear relation
The noise parameter enters both these equations through the definition C_{y} = C_{o} + ηC_{p}. We find the MLE by a 1D search: pick a value of η; compute m(η) as the solution of (7) and substitute it into (8). The value of η satisfying (8) is the MLE , and m() is the MLE . In practice, since the recovery rate parameter also requires a search, we find both parameters by maximizing the profile likelihood with respect to γ and η, as shown in Figure 1b. If a prior distribution were available for γ and η we would multiply it times the profile likelihood to obtain a posterior profile. In temporally sparse data sets with early storms we find that storm recovery rate γ can trade off with longterm rate r. To minimize this effect, we estimate γ separately for each model, then fix γ at its model probabilityweighted average, , then use that average γ with every model. For the Cotton Patch Hill data, γ^{−1} ranges from 7.2–12.1 y, and ()^{−1} = 8.4 y.
[12]Equation (7) leads to the definition of a generalized inverse matrix G^{−g} such that G^{−g}y. Thus G^{−g} = (G^{T}C_{y}^{−1}G)^{−1}G^{T}C_{y}^{−1}. If C_{o} were zero, the noise parameter would cancel out of the expression for G^{−g}, and the parameter covariance matrix would be given by the usual formula C_{m} = G^{−g}C_{y}(G^{−g})^{T}. For our noise model, the noise parameter does not cancel out of the expression for G^{−g}, and thus a data variation δy causes a corresponding variation in G^{−g} as well as in . The parameter covariance matrix is derived in the auxiliary material.
5. Prediction
[13] To predict the shoreline location at time t, we use the linearized model formula (4). It is helpful to express this as y(t) = q^{T} where q is a column vector—we refer to it as a prediction kernel—containing the value of each basis function at time t. For a prediction of the longterm rate, the prediction kernel is just q = [0, 1, 0,…, 0]^{T}. For a prediction of the actual rate at time t, the elements of q are the time derivatives of the basis functions in (4). For example, suppose we want only the component of shoreline displacement due to the first storm, at time t. The first storm involves the parameters s_{1} and μ, and the prediction kernel is
Here one might guess that the last element of q could be replaced by zero, since is always zero at the MLE. However, the last element in q contributes to the variance by coupling the uncertainty in μ to the uncertainty of the prediction. In each case, the variance of the prediction is given by the scalar q^{T}C_{m}q where C_{m} is the parameter covariance matrix given in the auxiliary material. Figure 1c shows the 95% confidence interval for shoreline predicted with several models.
6. Information Criteria
[14] An information criterion (IC) is a function whose value increases with the sum of squared residuals and with the number of model parameters (model complexity). The best model is the one with the lowest IC value. The use of an IC prevents overfitting data with too many storms, but it is not a panacea, since the choice of basis functions affects the performance of the IC [McQuarrie and Tsai, 1998]. In a case where the true basis functions are included in the candidate basis functions, an IC that picks the true basis functions with probability 1 as the number of data approaches infinity is said to be consistent. In the case where at least some of the true basis functions are missing from the set of candidate basis functions, an IC that picks the combination of basis functions that best approximates the true model is said to be asymptotically efficient. The corrected Akaike Information Criterion (AICc) used here is asymptotically efficient [McQuarrie and Tsai, 1998].
[15] An important feature of any information criterion I is that for any positive numbers a and b, the quantity a + bI takes its minimum at the same model as I and is therefore an equally good information criterion. The AICc formula of this paper is
in which N is the number of data points, K is the number of model parameters, and = /N − 1 − log(2π), where is −2 times the logarithm of the maximum likelihood. The second term in equation (10) is sometimes referred to as the complexity penalty or simply the penalty. The constant addends in the definition of make our AICc formula agree with the formula found in most books when our noise model is simplified to the usual noise model. For the noise model of this paper (equation (5)), is given by
in which is the MLE of the parameter vector, and _{y} = C_{p} + C_{o} where is the MLE of the noise parameter η, and ∣ · ∣ indicates a determinant. If C_{0} = 0, the expression for simplifies to
which is independent of . If C_{0} = 0 and C_{p} is proportional to the identity matrix, simplifies to
[16] Usually the parameter count K is equal to the number of basis functions plus one (for the variance of the noise), but if one or more basis functions contain the recovery rate γ, it must be included in the count. As we are interested mainly in longterm rate r, we do not count intercept as a parameter. (Notice that shifting all the data points by a fixed amount does not change the estimated longterm rate.)
7. Model Likelihood and Model Average
[17] The number of possible models included in equation (4) is but we exclude all models without rate or intercept. As several models have similar IC scores we average models based on their prior probability and IC weights [e.g., Burnham and Anderson, 2002, p. 75], referred to here as IC likelihood. We omit model selection error [Buckland et al., 1997] for consistency with methods utilizing only rate and intercept. Our method is related to Bayesian model averaging [Hoeting et al., 1999], but is thought to be less computationally intensive.
[18] In order to define an IC likelihood, it is numerically prudent to first subtract the IC score of the best model from all the other models. Each model thus has a deltaIC given by Δ_{j} = IC_{j} − min_{i}(IC_{i}). The IC likelihood of the j^{th} model is then given by
The IClikelihoods sum to 1, and are interpreted as model likelihoods conditional on the data used to compute the IC scores. We incorporate prior information about model probabilities using the probability calculus of Tarantola and Valette [1982]. Let π_{j} be the prior probability of model j and μ_{j} be the noninformative probability of model j. The posterior probability of model j is then
If one has no prior information about various models, π_{j} = μ_{j}, and so p_{j} = w_{j}. Here we take the noninformative probability to be uniform, so μ_{j} is the same for each j. (Even when prior probabilities are uniform, the π_{j} are useful. For example, to compute the average of models that do not include a particular storm, set π_{j} = 0 for each model containing that storm.) The 1962 storm has a storm erosion potential index three times greater than other storms [Zhang et al., 2001, Figure 8], and it is prominent in the Cotton Patch Hill data. Accordingly, we give a prior probability of zero to each model that does not include the 1962 storm. We give the model with no storms, only longterm rate, a prior probability half as large as that of models that include the 1962 storm. This is conservative with regard to the uncertainty because the model with no storm has the highest residuals, and one could reasonably exclude all models with no storm. The model priors, likelihoods and posterior probabilities are given in Table 2.
Table 2. Rates of Shoreline Loss for Various Models, With Their Prior Probabilities, ICLikelihoods and Posterior Probabilities^{a}
Model
Rate ± std (m/yr)
ΔAICc
IC Likelihood
Model Prior
Model Probability
a
‘R' indicates that the model has a longterm rate. ‘S62' indicates that the 1962 storm is included in the model, and similarly for other storms. The lowIC model (R,S62) is also the preferred model in an Ftest.
Averaged model
−0.49 ± 0.01




R
−0.59 ± 0.06
0.78
0.13
0.11
0.09
R, S29
−0.62 ± 0.06
1.77
0.08
0.00
0.00
R, S62
−0.51 ± 0.01
0.00
0.20
0.22
0.27
R, S92
−0.69 ± 0.07
1.72
0.08
0.00
0.00
R, S29, S62
−0.55 ± 0.01
0.22
0.18
0.22
0.24
R, S29, S92
−0.69 ± 0.07
2.62
0.05
0.00
0.00
R, S62, S92
−0.44 ± 0.02
0.80
0.13
0.22
0.18
R, S29, S62, S92
−0.34 ± 0.01
0.54
0.15
0.22
0.21
R (no poststorm data)
−0.57 ± 0.01




[19] To see how modelaveraging affects predictions, let ϕ be the quantity whose value is to be predicted. The modelaveraged ϕ is given by
where q_{j}^{T} is the prediction kernel and _{j} is the MLE of the parameter vector for model j.
[20] As the model probabilities p_{j} depend on the data, the formula for σ^{2} requires some care and is derived in the auxiliary material. Figure 1d shows the model average and the probabilities of its component models. Although the model with three storms is not the model with the highest probability, it gives by far the best fit to the data, as shown by its low process error in Table 1; it is interesting and desirable that the average model also shows the effects of all three storms.
8. Discussion and Conclusions
[21] As the times of large storms are known, their effects can be incorporated into models of historical shoreline change by use of the storm function, a onesided exponential with delay. Parsimony in the form of an IC prevents overfitting the data by inclusion of too many storms, and model averaging is an objective way of reconciling competing models. Subjectivity is explicit in the form of a prior probability for models. The method may have some advantages over other methods, such as least absolute deviations, because it gives a more precise estimate of longterm rate, as well as information about the magnitude of storms (auxiliary material). At Cotton Patch Hill, DE, the minimum longterm rate of shoreline loss is 0.34 ± 0.01 m/y (from a model with all three storms).The modelaveraged rate, 0.49 ± 0.01 m/y, is about 16% lower than earlier estimates. The sudden shoreline loss associated with the 1929, 1962 and 1992 storms was 19.4 ± 7.9 m, 94.8 ± 11.7 m and 9.6 ± 6.5 m, respectively. Here we outlined and solved the 1D problem, which is fundamental in shoreline change studies. Our solution to the 2D problem uses the methods of this paper to model the temporal coefficients of alongshore basis functions [Frazer et al., 2009] and will be presented separately.
Acknowledgments
[22] Funding for this study was provided by the U.S. Geological Survey, the University of Hawaii Sea Grant College and the Hawaii Department of Land and Natural Resources.