Standard Article

# Smoothing

Statistical and Numerical Computing

2. Paul S. Horn2

Published Online: 15 SEP 2006

DOI: 10.1002/9780470057339.vas029

## Encyclopedia of Environmetrics

#### How to Cite

Kafadar, K. and Horn, P. S. 2006. Smoothing. Encyclopedia of Environmetrics. 4.

#### Author Information

1. 1

2. 2

University of Cincinnati Ohio, OH, USA

#### Publication History

1. Published Online: 15 SEP 2006

## SEARCH

This is not the most recent version of the article. View current version (15 JAN 2013)

Smoothing is an exploratory tool that can be applied on various forms of data to achieve various purposes. Tukey [22, Chapter 7] advocated smoothing in his classic work Exploratory Data Analysis (EDA) and illustrated its use on data sequenced by a single variable such as time. Like all exploratory tools, the emphasis is on insight and flexibility, in contrast to hypothesizing a specific functional form, estimating parameters, and testing for model adequacy. Although smoothing methodologies are still described most often in the context of one-dimensional data (i.e. a response as a function of one variable), with time as the explanatory variable, Tukey [22, chapter 8 onwards] also showed by example that smoothing on variables other than time, as well as higher-dimensional smoothing (i.e. a response as a function of several variables), also provide insights into functional relationships between variables. In the context of environmental data, this higher-dimensional smoothing often takes the form of bivariate smoothing with a response (e.g. concentration of radon or volume of oil) as a function of two geographical variables (e.g. longitude and latitude).

The decomposition of data into data = fit + residuals or data = smooth + rough [22, p. 208] suggests a need to characterize the ‘smooth’ component of the data, which often leads to insight about the process that generated the data. Some of the goals of smoothing are:

1. To reveal the relationship between response and explanatory variables, which may suggest a functional model that describes their connections.

2. To magnify the underlying trend (see Trend Detecting).

3. To reduce attention to unusual values or outliers.

4. To examine patterns in the residuals that can be revealed once the smoothed trend has been removed.

5. To minimize the effect of aggregated values (sometimes called binning [12]), such as incidence measures that might apply to an entire region rather than to a specific point identified with the response.

Notice that these objectives differ from those for interpolation, where the fitted surface is constrained to pass through the observed data values and the goal is to estimate the response when the explanatory variables take on values other than those at hand. Interpolation in particular generally applies when measurement error is thought to be negligible. Most environmental applications involve errors from various sources in the measuring device (e.g. upper and/or lower limits of detection, location, calibration), so smoothing, rather than interpolation, is often more appropriate.

### Classes of Smoothers

Smoothers fall into one of two basic categories: linear, including local polynomial smoothing, loess (cf. [4] and [13]), splines [25] and kriging [5], and nonlinear, such as running medians and other median-based smoothers [14, 22, 24]. This entry introduces the concept and need for smoothing, and discusses briefly the smoothers in these two categories followed by examples in one and two dimensions.

A simple but illustrative example of both linear and nonlinear smoothing appears in Figure 1. These data are modified measurements of concentrations of a particular contaminant on n = 20 consecutive days. In the first row of plots, a shift has been added in the middle of the sequence; in the second row, an outlier has been added. The left column shows the effect of smoothing by running means of length 3: , i = 2, …, 19. The right column smooths by running medians of length 3: . (In both cases, at the endpoints i = 1 and 20.) Note that the eye is drawn to the trend line that is shown in each plot, which is one of the goals of smoothing (reduce attention to distracting detail). The smooths in the top two plots suggest immediately a change in level midway across the x-axis; the bottom two plots suggest almost constant values, apart from measurement error and possibly an outlier. These plots also show the advantage of smoothing by running medians in these two situations: median smoothing responds more quickly to abrupt features (vs. the more gradual shift when smoothing by means), and also is not influenced by single outliers unsupported by neighboring values (in contrast to the ‘tent’ constructed by the means smoother). Similar characteristics apply generally to other nonlinear smoothers. Outliers should be detectable readily in the residuals (original values – smooth values); residuals from the median smooth are similar except for the one value at i = 10, whereas the mean smooth presents three potential outliers at i = 9, 10 and 11.

### Models and Characteristics of Smoothers

In the example above, the underlying model for data that motivates smoothing, data = smooth + residual, can be written more formally as

• (1)

We assume that f is reasonably smooth (apart from possible ridges or abrupt changes), and that ri represents a mild departure (or occasionally a serious outlier) from the true f(xi). Note that xi may involve more than one variable; for geographical smoothing, xi = (xi1, xi2) = (longitude, latitude). One way to understand the action of smoothing by running means is to consider the case where f is reasonably constant in a neighborhood of xi and the residuals ri are independent with constant variance σ2. Then f(xi−1) ≈ f(xi) ≈ f(xi+1), so the mean of is approximately equal to the desired f(xi), while its variance is reduced by one-third. A wider span will reduce the variance even further, at the expense of introducing bias into the estimate [i.e. f(x) may not look much like f(xi) if x is far from xi], leading to the familiar bias–variance tradeoff. When the values of yi and xi are assumed to arise as realizations of some (possibly multivariate) random variables Y and X, and the mean of the residuals ri is zero (for otherwise its mean could be incorporated into f), then f(xi) may be expressed as the conditional expectation E(Y|X = xi). This is the usual situation for linear regression when f(xi) is a line (e.g. a + bxi for some a and b; see Linear Models) or, more generally, parametric regression when the function f is specified in terms of parameters (e.g. a and b above), and for nonparametric regression when the functional form of f(·) is left completely unspecified.

Characteristics of a good smoother are:

1. it should recreate f as accurately as possible;

2. it should recapture perfectly linear trends or surfaces;

3. it can handle unevenly spaced data;

4. its output should be ‘smooth’ (except at sharp break points);

5. extreme outliers should be properly ignored by the smooth and stand out clearly in the residuals.

### Linear Smoothers

Linear smoothers can always be expressed as a linear function of the data, e.g. . The weights usually depend upon the target point being smoothed, e.g. in the running means example , whereas . The span of the smoother is usually defined as the proportion of the weights that are nonzero; the larger the span, the smoother (and less variable) the surface, but the potentially greater bias in the estimate of f. Most nonparametric regression estimators take the form of linear smoothers, as can be seen in the equation for kernel-based estimators and loess [2 and (5) in Nonparametric regression model].

Spline smoothers (see Splines in Nonparametric Regression) may be viewed as extensions of the simple running mean/median smoothers, which implicitly assume that f is roughly constant in local neighborhoods, and of loess, which assumes that f is locally polynomial (usually linear, occasionally quadratic). Spline smoothers fit different polynomials in different segments of the data, where constraints are imposed to assure smoothness between segments at the knots that define segment endpoints. Various examples of spline smoothing appear in [2], [19] and [25].

When smoothing geographical data, two other linear smoothers are common: empirical Bayes smoothing [3], where the weights depend upon the prior distribution of the responses yi (designed specifically for mapping disease rates; see Disease Mapping), and kriging [15]. Kriging finds the best linear optimal predictor at Y = yi by estimating the parameter in a model for the covariance function of the response Y at various locations; usually this covariance function is defined in terms of the distance between these locations, and the ‘smoothness’ (i.e. local variability) of the surface is governed by this parameter [20, Chapter 4].

Another useful linear smoother is the multivariate adaptive regression spline (MARS) [10], wherein the ‘smooth’ takes the form

• (2)

where xi = (xi1, …, xiJ) denotes the J components of x [e.g. J = 2 for geographical smoothing and xi = (latitude, longitude)], and the basis function Bm(xij) is a truncated power function of the single variable xij, j = 1, …, J {i.e. Bm(u) = [(u − c0)q]+, where (y)+ = max[y, 0]}. MARS differs from typical spline smoothers in that the knots are not prespecified, but rather are found adaptively from the data themselves, and the basis functions are truncated linear functions, rather than polynomials. De Veaux et al. [8] developed a model for the spatial variations in seafloor topography that was suggested from applying the MARS smoother on data from sea ice concentrations in Antarctica. Similarly, wavelet smoothing proceeds by assuming that the smooth is a linear combination of some limited number of basis functions; various methods have been proposed for dictating which coefficients of these functions are nonzero and which can be omitted to yield a smooth approximation to the data [9]. MARS is particularly effective for exploratory purposes, but not when the process that generates the data is known to involve differential equations [7].

### Nonlinear Smoothers

Nonlinear functions may actually be approximated fairly well by linear smoothers, as long as the function is locally linear and the smoother uses an appropriate span that does not smooth over nonlinear features. While linear smoothers are straightforward theoretically and are easy to implement, they can also often fail to capture the extent of interesting features: peaks are squashed, troughs are raised, and abrupt shifts (e.g. mountain ranges) appear as gradual changes; sometimes linear smoothers smooth over these features completely. (The use of truncated linear basis functions in MARS and flexible basis functions in wavelet smoothing moderates this tendency to some degree.) However, for such situations, nonlinear smoothers can be more satisfactory, as shown in the first example. The mathematical operator in such nonlinear smoothers is usually the median. Proposals for nonlinear smoothers using various combinations of spans of running medians have been defined and illustrated for one-dimensional data by Tukey [22], Velleman [24] and Goodall [11], among others.

The electrical engineering literature has used median filters in two dimensions for image processing [17, 26]. Cressie [6] suggested a variant of this filter, called the median polish smoother, which does not require evenly spaced data such as pixels on a regular grid. Since the method is sensitive to orientation, Cressie recommends following the filter by kriging the residuals and adding the smoothed residuals to the result from the median polish filter. Another nonlinear smoother for two-dimensional data was proposed by Tukey and Tukey [23], but their ‘headbanging’ smoother can be shown to fail to capture linear surfaces. Both MARS and wavelet smoothing are likely to perform better when high-dimensional data involve ridges, edges or abrupt changes.

Regardless of which smoother is applied, two principles should be kept in mind. First, some smoothing of any form is almost always valuable. While the data are changed by smoothing (e.g. from yi to ), one can argue that they are changed only slightly, and that the potential benefits (reduced noise, insight into functional relationships, etc.) far outweigh this concern. Second, process knowledge should always be incorporated first into any fitting strategy. The main strength of smoothers is their ability to suggest functional forms when such knowledge may not be available, or on the residuals from a previously identified, possibly lower-order model (as illustrated by Example 1 below).

Example 1. Carbon dioxide in Mauna Loa. Figure 2 shows monthly concentrations (ppm) of carbon dioxide in the Mauna Loa (Hawaii) volcano, from January 1959 to December 1990. The data were collected by the Scripps Institute of Oceanography in La Jolla, CA, and are described in the S-PLUS documentation [16]. The most obvious trends are the increase over time (mostly linear, but with a slight curvature) and regular yearly cycles. After first removing a quadratic trend and the month effects [i.e. subtract a quadratic function of (year – 1975) and the January, February, etc., average from all January, February, etc., months], the resulting residuals are shown in Figure 3. Five smoothers of these residuals are shown in Figure 4; they all suggest some elevated concentrations (above and beyond the overall increase and monthly effects) in 1960–1965, 1973–1974 and 1978–1985. Running medians of length 3 are the least smooth, but also capture best the large peak in August 1973 and January 1988 to July 1989: as the span of this smoother increases to 5 and 11, the peaks become squashed. The two smoothest curves are loess with span 0.10 and loess with span 0.25; the former succeeds in capturing the general shape of the residuals over time but severely attenuates the more interesting features. The tradeoff between bias and variance in this example is well illustrated.

Example 2. Two-dimensional data. Nonlinear smoothers are not always so successful, especially in higher dimensions where measurement error is negligible. Figure 5(a) shows a smooth surface without error; normal (0, 0.01) error has been added to this surface in Figure 5(b). A median polish fit is shown in Figure 5(c); notice that it fails to capture well the nearly constant surface at values of x > 8. Cressie's median polish smoother [6] (which takes the result of Figure 5(c) and adds smoothed residuals) appears in Figure 5(d); the smoother recaptures the trend at the upper end of x but still suggests undulations that did not exist in the true underlying surface. In contrast, the loess smooths tend to reproduce the surface more faithfully, especially when the span lies in the range 0.10 (Figure 5e) to 0.30 (Figure 5f). Objective criteria for choosing the span have been proposed (e.g. cross-validation), but most people still prefer to choose the span ‘by eye’.

Two particularly useful books on smoothing are Simonoff [21] and Bowman and Azzalini [1]; the former emphasizes applications of smoothing for density estimates, while the latter emphasizes more general smoothing applications as described in this entry. Both concentrate on one-dimensional smoothing. These books, as well as the entry on Nonparametric Regression Model, also discuss the issue of selecting a bandwidth. Ripley [20] and Cressie [5] offer more methods and examples for higher-dimensional data. O'Sullivan [18] presents a discussion of robust smoothing.

### References

• 1
& (1997). Applied Smoothing Techniques for Data Analysis, Oxford University Press, London.
• 2
(1997). Multivariate regression splines, Computational Statistics and Data Analysis 26, 7182.
• 3
& (1987). Empirical Bayes estimates of age-standardized relative risks for use in disease mapping, Biometrics 43, 671682.
• 4
& (1988). Locally weighted regression: an approach to regression analysis by local fitting, Journal of the American Statistical Association 83, 596610.
• 5
(1986). Kriging nonstationary data, Journal of the American Statistical Association 81, 625634.
• 6
(1993). Statistics for Spatial Data, Revised Edition, Wiley, New York.
• 7
, & (1999). Hybrid neural network models for environmental process control, Environmetrics 10, 225236.
• 8
, , & (1993). Modeling of topographic effects on Antarctic sea ice using multivariate adaptive regression splines, Journal of Geophysical Research 98, 20 30720 319.
• 9
& (1995). Adapting to unknown smoothness via wavelet shrinkage, Journal of the American Statistical Association 90, 12001224.
• 10
(1991). Multivariate adaptive regression splines, The Annals of Statistics 19, 1141.
• 11
(1991). A survey of smoothing techniques, in Modern Methods of Data Analysis, Chapter 3, & , eds, Sage, Beverly Hills, pp. 126176.
• 12
(1998). Binning, in Encyclopedia of Statistical Sciences, Update Vol. 2, , & , eds, Wiley, New York, pp. 6465.
• 13
& (1993). Local regression: automatic kernel carpentry (with discussion), Statistical Science 8, 120143.
• 14
(1980). Some theory of nonlinear smoothers, The Annals of Statistics 8, 695715.
• 15
(1963). Principles of geostatistics, Economic Geology 58, 12461266.
• 16
Mathsoft (1993). S-Plus, Unix Version.
• 17
(1981). A separable median filter for image noise smoothing, IEEE Transactions on Pattern Analysis and Machine Intelligence 3, 2029.
• 18
(1988). Robust smoothing, in Encyclopedia of Statistical Sciences, Vol. 8, , & , eds, Wiley, New York, pp. 170173.
• 19
(1988). Monotone regression splines in action (with discussion), Statistical Science 3, 424461.
• 20
(1981). Spatial Statistics, Wiley, New York.
• 21
(1996). Smoothing Methods in Statistics, Springer-Verlag, New York.
• 22