Standard Article

Optimization

Statistical and Numerical Computing

  1. Gordon K. Smyth

Published Online: 15 SEP 2006

DOI: 10.1002/9780470057339.vao011

Encyclopedia of Environmetrics

Encyclopedia of Environmetrics

How to Cite

Smyth, G. K. 2006. Optimization. Encyclopedia of Environmetrics. 4.

Author Information

  1. University of Queensland, Brisbane, Australia

Publication History

  1. Published Online: 15 SEP 2006

Abstract

Optimization means to find that value which maximizes or minimizes a given function. The idea is intimately connected with statistical methods such as least squares, maximum likelihood, posterior mode and so on. Many optimization algorithms are based on the idea of solving the nonlinear equation that arises from setting the derivative vector equal to zero. Except in linear cases, optimization almost invariably proceeds by iteration. This article is concerned with unconstrained optimization in which function arguments not specially restricted.

In general, optimization methods which use derivatives are more powerful than those that do not, although the increase in speed may not always outweigh the overhead in computing the derivatives. If first and second derivatives of the function are available, Newton's method is simple and works well. Some sort of backtracking strategy, such as a line-search or Levenberg–Marquardt damping, is necessary, though, to prevent divergence from a poor starting value. If second derivatives are not available, then quasi-Newtonian methods, of which Fisher's method of scoring is one, are recommended. Particular applications of Fisher scoring to nonlinear regression and to generalized linear models are described. General purpose quasi-Newtonian algorithms build up a working approximation to the second-derivative matrix from successive values of the first derivative. If even first derivatives are unavailable, the Nelder–Mead downhill simplex algorithm is compact and reasonably robust. However, the slightly more complex direction-set methods or Newtonian methods with finite-difference approximations to the derivatives should minimize most functions more quickly. The one-dimensional problem is considered separately. In one dimension, once one can provide an interval that contains the solution, there exist efficient ‘low-technology’ algorithms robust enough to take on all problems.

A maximum or minimum of the function can be either global or local, and various heuristics have been used to distinguish one from the other. Any method that relies on the iterative refinement of a single working approximation will fail in the presence of a large number of local optima. Algorithms that are specially designed to cope these circumstances include simulated annealing and genetic algorithms.