On the Computation of Convergence Regions for Sequential Nonlinear Programming Problems

In this work, we formulate and solve the problem of finding the ball of maximum radius around a local minimum of a nonlinear optimization problem, which is invariant with respect to the gradient descent method. This problem arises in the context of solving sequences of nonlinear optimization problems, in which one usually strives to converge to qualitatively similar solutions. We illustrate our idea with an example of a nonlinear function of two variables.


Introduction
In many real-world applications, one has to solve a sequence of related nonlinear optimization problems (NLPs). For instance, a certain NLP might undergo perturbations that produce a set of "shifted" problems to be solved consecutively. In that case, one strives to avoid "jumps" from one local minimizer to another which might occur in such a scheme. Typical globalization techniques such as Armijo or Wolfe conditions [1] are not sufficient to guarantee the convergence to any fixed minimizer, hence, a different approach needs to be chosen.
As a more formal setting, consider the parameterized NLP where s ∈ R n is a perturbation which corresponds to a "shift" of the original problem, and let x * be a local minimizer in the case s = 0. Further, let s (i) N i=1 ⊆ R n be a sequence of perturbations and consider the corresponding sequence of NLPs, in which the solution of the previous problem is the initial guess for the current one: and x * (0) = x * . We assume that for each NLP (2) the same local optimization method is used. For the process which arises from solving Eq. (2), our question is the following: what is the maximum perturbation magnitude d (in the sense of the Euclidean norm · 2 ) such that if s (i) 2 < d for all i = 1, 2, . . . , then Eq. (2) yields the sequence of solutions x * − s (i) N i=1 ? In other words, we want to guarantee the convergence to the same (suitably shifted) minimizer x * .
To answer the question above, we need to find the maximum radius r * of the ball B (x * ; r * ) := {x ∈ R n : x − x * 2 < r * } such that for all x 0 ∈ B (x * ; r * ), the sequence of points {x k } k produced by the local method starting with x 0 converges to x * . This problem can be formulated as follows: where g is a certain non-negative function related to the chosen optimization algorithm.

Application to Gradient Descent Method
Let us now consider a particular application of Problem (3). Let f : R 2 → R be a target function, and x * be a local minimizer.
As an optimization algorithm, we choose the gradient descent method (GDM) In this case, we answer a slightly weaker question, namely, to find the ball of the maximum radius r * such that the points {x k } k produced by GDM (4) satisfy x k ∈ B (x * ; r * ) for all k = 1, 2, . . . whenever x 0 ∈ B (x * ; r * ).  Fix the angleᾱ ∈ 0, π 2 . For x = x * we require that the angle α between − → v := − − → x * x and − −−− → ∇f (x) = 0 be at mostᾱ (see Fig. 1). Hence, there exists t > 0 such that p (t) := x − t − −−− → ∇f (x) lies on the circle of radius − → v 2 centered at x * , and for each s ∈ (0, t) we have that x * − p (s) 2 < x * − x 2 . Formally, for the variation 0 = v ∈ R n we require that in Problem (3), we can find a radius r * (which, in general, can be infinite) such that the condition α ≤ᾱ or ∇f (x) = 0 holds uniformly in B (x * ; r * ). Furthermore, for the finite r * the inequality γ k ∇f (x k ) 2 ≤ r * cos (ᾱ) ∀k = 1, 2, . . .  Fig. 2 also shows the iterative process of GDM for two different initial guesses. As one can see, the disks B (M 1 ; r * 1 ) and B (M 2 ; r * 2 ) are indeed maximum, i.e. choosing the initial guess even a little bit outside of B (M 2 ; r * 2 ) causes the corresponding GDM scheme to converge to a different solution.

Future Work
In this work, we formulate and analyze a model of sequential NLPs and show how the maximal radius of the ball which is invariant with respect to GDM can be obtained. In the future, we aim to provide a link to exemplary real-world applications such as model predictive control, in which consecutive optimal control problems are typically similar to each other [3]. On the other hand, the above-mentioned ideas have to be extended to methods other than GDM, in particular, the constrained optimization algorithms. Finally, a clear connection to the field of robust optimization [4] needs to be established.