Standard Article

# Smoothing

Statistical and Numerical Computing

Published Online: 15 SEP 2006

DOI: 10.1002/9780470057339.vas029

Copyright © 2002 John Wiley & Sons, Ltd

Book Title

## Encyclopedia of Environmetrics

Additional Information

#### How to Cite

Kafadar, K. and Horn, P. S. 2006. Smoothing. Encyclopedia of Environmetrics. 4.

#### Publication History

- Published Online: 15 SEP 2006

This is not the most recent version of the article. View current version (15 JAN 2013)

Smoothing is an exploratory tool that can be applied on various forms of data to achieve various purposes. Tukey [22, Chapter 7] advocated smoothing in his classic work *Exploratory Data Analysis* (EDA) and illustrated its use on data sequenced by a single variable such as time. Like all exploratory tools, the emphasis is on insight and flexibility, in contrast to hypothesizing a specific functional form, estimating parameters, and testing for model adequacy. Although smoothing methodologies are still described most often in the context of one-dimensional data (i.e. a response as a function of one variable), with *time* as the explanatory variable, Tukey [22, chapter 8 onwards] also showed by example that smoothing on variables other than time, as well as higher-dimensional smoothing (i.e. a response as a function of several variables), also provide insights into functional relationships between variables. In the context of environmental data, this higher-dimensional smoothing often takes the form of bivariate smoothing with a response (e.g. concentration of radon or volume of oil) as a function of two geographical variables (e.g. longitude and latitude).

The decomposition of data into *data* = *fit* + *residuals* or *data* = *smooth* + *rough* [22, p. 208] suggests a need to characterize the ‘smooth’ component of the data, which often leads to insight about the process that generated the data. Some of the goals of smoothing are:

To reveal the relationship between response and explanatory variables, which may suggest a functional model that describes their connections.

To magnify the underlying trend (

*see*Trend Detecting).To reduce attention to unusual values or outliers.

To examine patterns in the residuals that can be revealed once the smoothed trend has been removed.

To minimize the effect of aggregated values (sometimes called

*binning*[12]), such as incidence measures that might apply to an entire region rather than to a specific point identified with the response.

Notice that these objectives differ from those for interpolation, where the fitted surface is constrained to pass through the observed data values and the goal is to estimate the response when the explanatory variables take on values other than those at hand. Interpolation in particular generally applies when measurement error is thought to be negligible. Most environmental applications involve errors from various sources in the measuring device (e.g. upper and/or lower limits of detection, location, calibration), so smoothing, rather than interpolation, is often more appropriate.

### Classes of Smoothers

- Top of page
- Classes of Smoothers
- Models and Characteristics of Smoothers
- Linear Smoothers
- Nonlinear Smoothers
- Further Reading
- References

Smoothers fall into one of two basic categories: *linear*, including local polynomial smoothing, *loess* (cf. [4] and [13]), splines [25] and kriging [5], and *nonlinear*, such as running medians and other median-based smoothers [14, 22, 24]. This entry introduces the concept and need for smoothing, and discusses briefly the smoothers in these two categories followed by examples in one and two dimensions.

A simple but illustrative example of both linear and nonlinear smoothing appears in Figure 1. These data are modified measurements of concentrations of a particular contaminant on *n* = 20 consecutive days. In the first row of plots, a shift has been added in the middle of the sequence; in the second row, an outlier has been added. The left column shows the effect of smoothing by running *means* of length 3: , *i* = 2, …, 19. The right column smooths by running *medians* of length 3: . (In both cases, at the endpoints *i* = 1 and 20.) Note that the eye is drawn to the trend line that is shown in each plot, which is one of the goals of smoothing (reduce attention to distracting detail). The smooths in the top two plots suggest immediately a change in level midway across the *x*-axis; the bottom two plots suggest almost constant values, apart from measurement error and possibly an outlier. These plots also show the advantage of smoothing by running medians in these two situations: median smoothing responds more quickly to abrupt features (vs. the more gradual shift when smoothing by means), and also is not influenced by single outliers unsupported by neighboring values (in contrast to the ‘tent’ constructed by the means smoother). Similar characteristics apply generally to other nonlinear smoothers. Outliers should be detectable readily in the *residuals* (original values – smooth values); residuals from the median smooth are similar except for the one value at *i* = 10, whereas the mean smooth presents *three* potential outliers at *i* = 9, 10 and 11.

### Models and Characteristics of Smoothers

- Top of page
- Classes of Smoothers
- Models and Characteristics of Smoothers
- Linear Smoothers
- Nonlinear Smoothers
- Further Reading
- References

In the example above, the underlying model for data that motivates smoothing, *data* = *smooth* + *residual*, can be written more formally as

- (1)

We assume that *f* is reasonably smooth (apart from possible ridges or abrupt changes), and that *r*_{i} represents a mild departure (or occasionally a serious outlier) from the true *f*(*x*_{i}). Note that *x*_{i} may involve more than one variable; for geographical smoothing, **x**_{i} = (*x*_{i1}, *x*_{i2}) = (longitude, latitude). One way to understand the action of smoothing by running means is to consider the case where *f* is reasonably constant in a neighborhood of *x*_{i} and the residuals *r*_{i} are independent with constant variance σ^{2}. Then *f*(*x*_{i−1}) ≈ *f*(*x*_{i}) ≈ *f*(*x*_{i+1}), so the mean of is approximately equal to the desired *f*(*x*_{i}), while its variance is reduced by one-third. A wider *span* will reduce the variance even further, at the expense of introducing *bias* into the estimate [i.e. *f*(*x*) may not look much like *f*(*x*_{i}) if *x* is far from *x*_{i}], leading to the familiar *bias–variance* tradeoff. When the values of *y*_{i} and *x*_{i} are assumed to arise as realizations of some (possibly multivariate) random variables *Y* and *X*, and the mean of the residuals *r*_{i} is zero (for otherwise its mean could be incorporated into *f*), then *f*(*x*_{i}) may be expressed as the conditional expectation E(*Y*|*X* = *x*_{i}). This is the usual situation for *linear regression* when *f*(*x*_{i}) is a line (e.g. *a* + *bx*_{i} for some *a* and *b*; *see* Linear Models) or, more generally, *parametric regression* when the function *f* is specified in terms of parameters (e.g. *a* and *b* above), and for *nonparametric regression* when the functional form of *f*(·) is left completely unspecified.

Characteristics of a good smoother are:

it should recreate

*f*as accurately as possible;it should recapture perfectly linear trends or surfaces;

it can handle unevenly spaced data;

its output should be ‘smooth’ (except at sharp break points);

extreme outliers should be properly ignored by the smooth and stand out clearly in the residuals.

### Linear Smoothers

- Top of page
- Classes of Smoothers
- Models and Characteristics of Smoothers
- Linear Smoothers
- Nonlinear Smoothers
- Further Reading
- References

*Linear* smoothers can always be expressed as a linear function of the data, e.g. . The weights usually depend upon the target point being smoothed, e.g. in the running means example , whereas . The *span* of the smoother is usually defined as the proportion of the weights that are nonzero; the larger the span, the smoother (and less variable) the surface, but the potentially greater bias in the estimate of *f*. Most nonparametric regression estimators take the form of linear smoothers, as can be seen in the equation for kernel-based estimators and loess [2 and (5) in Nonparametric regression model].

Spline smoothers (*see* Splines in Nonparametric Regression) may be viewed as extensions of the simple running mean/median smoothers, which implicitly assume that *f* is roughly constant in local neighborhoods, and of loess, which assumes that *f* is locally polynomial (usually linear, occasionally quadratic). Spline smoothers fit different polynomials in different segments of the data, where constraints are imposed to assure smoothness between segments at the knots that define segment endpoints. Various examples of spline smoothing appear in [2], [19] and [25].

When smoothing geographical data, two other linear smoothers are common: empirical Bayes smoothing [3], where the weights depend upon the prior distribution of the responses *y*_{i} (designed specifically for mapping disease rates; *see* Disease Mapping), and kriging [15]. Kriging finds the best linear optimal predictor at *Y* = *y*_{i} by estimating the parameter in a model for the covariance function of the response *Y* at various locations; usually this covariance function is defined in terms of the distance between these locations, and the ‘smoothness’ (i.e. local variability) of the surface is governed by this parameter [20, Chapter 4].

Another useful linear smoother is the multivariate adaptive regression spline (MARS) [10], wherein the ‘smooth’ takes the form

- (2)

where **x**_{i} = (*x*_{i1}, …, *x*_{iJ}) denotes the *J* components of **x** [e.g. *J* = 2 for geographical smoothing and **x**_{i} = (latitude, longitude)], and the basis function *B*_{m}(*x*_{ij}) is a truncated power function of the single variable *x*_{ij}, *j* = 1, …, *J* {i.e. *B*_{m}(*u*) = [(*u* − *c*_{0})^{q}]_{+}, where (*y*)_{+} = max[*y*, 0]}. MARS differs from typical spline smoothers in that the knots are not prespecified, but rather are found adaptively from the data themselves, and the basis functions are truncated linear functions, rather than polynomials. De Veaux et al. [8] developed a model for the spatial variations in seafloor topography that was suggested from applying the MARS smoother on data from sea ice concentrations in Antarctica. Similarly, wavelet smoothing proceeds by assuming that the smooth is a linear combination of some limited number of basis functions; various methods have been proposed for dictating which coefficients of these functions are nonzero and which can be omitted to yield a smooth approximation to the data [9]. MARS is particularly effective for exploratory purposes, but not when the process that generates the data is known to involve differential equations [7].

### Nonlinear Smoothers

- Top of page
- Classes of Smoothers
- Models and Characteristics of Smoothers
- Linear Smoothers
- Nonlinear Smoothers
- Further Reading
- References

Nonlinear functions may actually be approximated fairly well by linear smoothers, as long as the function is *locally* linear and the smoother uses an appropriate span that does not smooth over nonlinear features. While linear smoothers are straightforward theoretically and are easy to implement, they can also often fail to capture the extent of interesting features: peaks are squashed, troughs are raised, and abrupt shifts (e.g. mountain ranges) appear as gradual changes; sometimes linear smoothers smooth over these features completely. (The use of truncated linear basis functions in MARS and flexible basis functions in wavelet smoothing moderates this tendency to some degree.) However, for such situations, nonlinear smoothers can be more satisfactory, as shown in the first example. The mathematical operator in such nonlinear smoothers is usually the median. Proposals for nonlinear smoothers using various combinations of spans of running medians have been defined and illustrated for one-dimensional data by Tukey [22], Velleman [24] and Goodall [11], among others.

The electrical engineering literature has used median filters in two dimensions for image processing [17, 26]. Cressie [6] suggested a variant of this filter, called the *median polish smoother*, which does not require evenly spaced data such as pixels on a regular grid. Since the method is sensitive to orientation, Cressie recommends following the filter by kriging the residuals and adding the smoothed residuals to the result from the median polish filter. Another nonlinear smoother for two-dimensional data was proposed by Tukey and Tukey [23], but their ‘headbanging’ smoother can be shown to fail to capture linear surfaces. Both MARS and wavelet smoothing are likely to perform better when high-dimensional data involve ridges, edges or abrupt changes.

Regardless of which smoother is applied, two principles should be kept in mind. First, *some* smoothing of any form is almost always valuable. While the data are changed by smoothing (e.g. from *y*_{i} to ), one can argue that they are changed only slightly, and that the potential benefits (reduced noise, insight into functional relationships, etc.) far outweigh this concern. Second, process knowledge should always be incorporated first into any fitting strategy. The main strength of smoothers is their ability to suggest functional forms when such knowledge may not be available, or on the residuals from a previously identified, possibly lower-order model (as illustrated by Example 1 below).

Example 1. Carbon dioxide in Mauna Loa. Figure 2 shows monthly concentrations (ppm) of carbon dioxide in the Mauna Loa (Hawaii) volcano, from January 1959 to December 1990. The data were collected by the Scripps Institute of Oceanography in La Jolla, CA, and are described in the S-PLUS documentation [16]. The most obvious trends are the increase over time (mostly linear, but with a slight curvature) and regular yearly cycles. After first removing a quadratic trend and the month effects [i.e. subtract a quadratic function of (year – 1975) and the January, February, etc., average from all January, February, etc., months], the resulting residuals are shown in Figure 3. Five smoothers of these residuals are shown in Figure 4; they all suggest some elevated concentrations (above and beyond the overall increase and monthly effects) in 1960–1965, 1973–1974 and 1978–1985. Running medians of length 3 are the least smooth, but also capture best the large peak in August 1973 and January 1988 to July 1989: as the span of this smoother increases to 5 and 11, the peaks become squashed. The two smoothest curves are loess with span 0.10 and loess with span 0.25; the former succeeds in capturing the general shape of the residuals over time but severely attenuates the more interesting features. The tradeoff between bias and variance in this example is well illustrated.

Example 2. Two-dimensional data. Nonlinear smoothers are not always so successful, especially in higher dimensions where measurement error is negligible. Figure 5(a) shows a smooth surface without error; normal (0, 0.01) error has been added to this surface in Figure 5(b). A median polish fit is shown in Figure 5(c); notice that it fails to capture well the nearly constant surface at values of *x* > 8. Cressie's median polish smoother [6] (which takes the result of Figure 5(c) and adds smoothed residuals) appears in Figure 5(d); the smoother recaptures the trend at the upper end of *x* but still suggests undulations that did not exist in the true underlying surface. In contrast, the loess smooths tend to reproduce the surface more faithfully, especially when the span lies in the range 0.10 (Figure 5e) to 0.30 (Figure 5f). Objective criteria for choosing the span have been proposed (e.g. cross-validation), but most people still prefer to choose the span ‘by eye’.

### Further Reading

- Top of page
- Classes of Smoothers
- Models and Characteristics of Smoothers
- Linear Smoothers
- Nonlinear Smoothers
- Further Reading
- References

Two particularly useful books on smoothing are Simonoff [21] and Bowman and Azzalini [1]; the former emphasizes applications of smoothing for density estimates, while the latter emphasizes more general smoothing applications as described in this entry. Both concentrate on one-dimensional smoothing. These books, as well as the entry on Nonparametric Regression Model, also discuss the issue of selecting a bandwidth. Ripley [20] and Cressie [5] offer more methods and examples for higher-dimensional data. O'Sullivan [18] presents a discussion of robust smoothing.

### References

- Top of page
- Classes of Smoothers
- Models and Characteristics of Smoothers
- Linear Smoothers
- Nonlinear Smoothers
- Further Reading
- References

- 11997). Applied Smoothing Techniques for Data Analysis, Oxford University Press, London.& (
- 21997). Multivariate regression splines, Computational Statistics and Data Analysis 26, 71–82.(
- 31987). Empirical Bayes estimates of age-standardized relative risks for use in disease mapping, Biometrics 43, 671–682.& (
- 41988). Locally weighted regression: an approach to regression analysis by local fitting, Journal of the American Statistical Association 83, 596–610.& (
- 51986). Kriging nonstationary data, Journal of the American Statistical Association 81, 625–634.(
- 6 (
- 71999). Hybrid neural network models for environmental process control, Environmetrics 10, 225–236., & (Direct Link:
- 81993). Modeling of topographic effects on Antarctic sea ice using multivariate adaptive regression splines, Journal of Geophysical Research 98, 20 307–20 319., , & (
- 91995). Adapting to unknown smoothness via wavelet shrinkage, Journal of the American Statistical Association 90, 1200–1224.& (
- 101991). Multivariate adaptive regression splines, The Annals of Statistics 19, 1–141.(
- 111991). A survey of smoothing techniques, in Modern Methods of Data Analysis, Chapter 3, J. Fox & J.S. Long, eds, Sage, Beverly Hills, pp. 126–176.(
- 121998). Binning, in Encyclopedia of Statistical Sciences, Update Vol. 2, S. Kotz, C.B. Read & D.C. Banks, eds, Wiley, New York, pp. 64–65.(
- 131993). Local regression: automatic kernel carpentry (with discussion), Statistical Science 8, 120–143.& (
- 141980). Some theory of nonlinear smoothers, The Annals of Statistics 8, 695–715.(
- 15 (
- 16Mathsoft (1993). S-Plus, Unix Version.
- 171981). A separable median filter for image noise smoothing, IEEE Transactions on Pattern Analysis and Machine Intelligence 3, 20–29.(
- 181988). Robust smoothing, in Encyclopedia of Statistical Sciences, Vol. 8, S. Kotz, N.L. Johnson & C. Read, eds, Wiley, New York, pp. 170–173.(
- 191988). Monotone regression splines in action (with discussion), Statistical Science 3, 424–461.(
- 20 (
- 21 (
- 221977). Exploratory Data Analysis, Addison-Wesley, Reading.(
- 231981). Graphic display of data sets in 3 or more dimensions, in Interpreting Multivariate Data, V. Barnett, ed., Wiley, Chichester, 1981, pp. 189–275. [Reprinted in Cleveland, W.S., ed. (1988), The Collected Works of John W. Tukey, Vol. V:& (
*Graphics, 1965–1975*, Wadsworth, Belmont, pp. 188–288.] - 241980). Definition and comparison of robust nonlinear data smoothing algorithms, Journal of the American Statistical Association 75, 609–615.(
- 25 (
- 261981). The effect of median filtering on edge location estimation, Computer Graphics and Image Processing 15, 224–245.& (