Gradient Search (gradsearch)

Keywords: gradsearch gradient descent search optimization

This module implements a gradient or "steepest-descent" search. Given a function \(f\) which operates on a vector\(\vec{x} = [x_0,x_1,\ldots,x_{N-1}]^T\) of \(N\) parameters, the gradient search method seeks to find the optimum \(\vec{x}\) which minimizes \(f(\vec{x})\) .

Theory

The gradient search is an iterative method and adjusts \(\vec{x}\) proportional to the negative of the gradient of \(f\) evaluated at the current location. The vector \(\vec{x}\) is adjusted by

$$ \Delta \vec{x}[n+1] = -\gamma[n] \nabla f(\vec{x}[n]) $$

where \(\gamma[n]\) is the step size and\(\nabla f(\vec{x}[n])\) is the gradient of \(f\) at \(\vec{x}\) , at the \(n^{th}\) iteration. The gradient is a vector field which points to the greatest rate of increase, and is computed at \(\vec{x}\) as

$$ \nabla f(\vec{x}) = \left( \frac{\partial f}{\partial x_0}, \frac{\partial f}{\partial x_1}, \ldots, \frac{\partial f}{\partial x_{N-1}} \right) $$

In most non-linear optimization problems, \(\nabla f(\vec{x})\) is not known, and must be approximated for each value of \(\vec{x}[n]\) using the finite element method. The partial derivative of the \(k^{th}\) component is estimated by computing the slope of \(f\) when \(x_k\) is increased by a small amount \(\Delta\) while holding all other elements of \(\vec{x}\) constant. This process is repeated for all elements in \(\vec{x}\) to compute the gradient vector. Mathematically, the \(k^{th}\) component of the gradient is approximated by

$$ \frac{\partial f(\vec{x})}{\partial x_k} \approx \frac{f(x_0,\ldots,x_k+\Delta,\ldots,x_{N-1}) - f(\vec{x})}{\Delta} $$

Once \(\nabla f(\vec{x}[n])\) is known, \(\Delta\vec{x}[n+1]\) is computed and the optimizing vector is updated via

$$ \vec{x}[n+1] = \vec{x}[n] + \Delta\vec{x}[n+1] $$

Momentum constant

When \(f(\vec{x})\) is flat (i.e. \(\nabla f(\vec{x})\approx \vec{0}\) ), convergence will be slow. This effect can be mitigated by permitting the update vector equation to retain a small portion of the previous step vector. The updated vector at time \(n+1\) is

$$ \vec{x}[n+1] = \vec{x}[n] + \Delta\vec{x}[n+1] + \alpha\Delta\vec{x}[n] $$

where \(\Delta\vec{x}[0] = \vec{0}\) . The effective update at time \(n+1\) is

$$ \vec{x}[n+1] = \sum_{k=0}^{n+1}{\alpha^{k}\Delta\vec{x}[n+1-k]} $$

which is stable only for \(0 \le \alpha \lt 1\) . For flat regions, the gradient vector \(\nabla f(\vec{x})\) is approximately a constant \(\Delta\vec{x}\) , and \(\vec{x}[n]\) therefore becomes a geometric series converging to \(\Delta\vec{x}/(1-\alpha)\) . This accelerates the algorithm across relatively flat regions of \(f\) . The momentum constant additionally adds some stability for regions where the gradient method tends to oscillate, such as steep valleys in \(f\) .

Step size adjustment

In liquid , the gradient is normalized to unity (orthonormal). That is \(\|\nabla f(\vec{x}[n])\|=1\) . Furthermore, \(\gamma\) is slightly reduced each epoch by a multiplier \(\mu\)

$$ \gamma[n+1] = \mu \gamma[n] $$

This helps improve stability and convergence over regions where the algorithm might oscillate due to steep values of \(f\) .

Interface

Here is a summary of the parameters used in the gradient search algorithm and their default values:

  • \(\Delta\) : step size in computing the gradient (default \(10^{-6}\) )
  • \(\gamma\) : step size in updating \(\vec{x}[n]\) (default 0.002)
  • \(\alpha\) : momentum constant (default 0.1)
  • \(\mu\)

Here is the basic interface to the gradsearch object:

  • gradsearch_create(*userdata,*v,n,utility,min/max,*props) creates a gradient search object designed to optimize an \(n\) -point vector \(\vec{v}\) . The user-defined utility function and userdata structures define the search, as well as the min/max flag which can be either LIQUID_OPTIM_MINIMIZE or LIQUID_OPTIM_MAXIMIZE . Finally, the search is parametrized by the props structure; if set to NULL the defaults will be used. When run the gradsearch object will update the "optimal" value in the input vector \(\vec{v}\) specified during create() .
  • gradsearch_destroy(q) destroys a gradsearch object, freeing all internally-allocated memory.
  • gradsearch_print(q) prints the internal state of the gradient search object.
  • gradsearch_reset(q) resets the internal state of the gradient search object.
  • gradsearch_step(q) steps through a single iteration of the gradient search. The result is stored in the original input vector \(\vec{v}\) specified during the create() method.
  • gradsearch_execute(q,n,target_utility) runs multiple iterations of the search algorithm, stopping after either \(n\) iterations or if the target utility is met.

Here is an example of how the gradient_search is used:

#include <liquid/liquid.h>

// user-defined utility callback function
float myutility(void * _userdata, float * _v, unsigned int _n)
{
    float u = 0.0f;
    unsigned int i;
    for (i=0; i<_n; i++)
        u += _v[i] * _v[i];
    return u;
}

int main() {
    unsigned int num_parameters = 8;    // search dimensionality
    unsigned int num_iterations = 100;  // number of iterations to run
    float target_utility = 0.01f;       // target utility

    float v[num_parameters];            // optimum vector

    // ... intialize v ...

    // create gradsearch object
    gradsearch gs = gradsearch_create(NULL,
                                      v,
                                      num_parameters,
                                      &myutility,
                                      LIQUID_OPTIM_MINIMIZE);

    // execute batch search
    gradsearch_execute(gs, num_iterations, target_utility);

    // clean it up
    gradsearch_destroy(gs);
}

Notice that the utility function is a callback that is completely defined by the user.

doc/gradsearch/gradsearch_example.png

Figure [fig-optim-gradsearch]. gradsearch performance for 2-parameter Rosenbrock function \(f(x,y) = (1-x)^2 + 100(y-x^2)^2\) with a starting point of \((x_0,y_0)=(-0.1,1.4)\) . The minimum is located at \((1,1)\) .

[ref:fig-optim-gradsearch] depicts the performance of the gradient search for the Rosenbrock function, defined as\(f(x,y) = (1-x)^2 + 100(y-x^2)^2\) for input parameters \(x\) and \(y\) . The Rosenbrock function has a minimum at \((x,y)=(1,1)\) ; however the minimum lies in a deep valley which can be difficult to navigate. From the figure it is apparent that finding the valley is trivial, but convergence to the minimum is slow.