2019-07-29 When converting an objective function defined in the search space to a function defined in the parameter space, the method of taking the expectation value was generally used in the past, but since the mean is not an invariant to the mapping of the objective function by a monotonically increasing function, it was changed to a definition using quantiles.
http://www.jmlr.org/papers/volume18/14-467/14-467.pdf
We present a canonical way to turn any smooth parametric family of probability distributions on an arbitrary search space 𝑋 into a continuous-time black-box optimization method on 𝑋, the information-geometric optimization (IGO) method.
Invariance as a major design principle keeps the number of arbitrary choices to a minimum.
The resulting IGO flow is the flow of an ordinary differential equation conducting the natural gradient ascent of an adaptive, time-dependent transformation of the objective function.
It makes no particular assumptions on the objective function to be optimized.
The IGO method produces explicit IGO algorithms through time discretization.
It naturally recovers versions of known algorithms and offers a systematic way to derive new ones.
All these algorithms are natural instances of, and unified under, the single information-geometric optimization framework.
The IGO method achieves, thanks to its intrinsic formulation, maximal invariance properties:
Theoretical considerations strongly suggest that IGO algorithms are essentially characterized by a minimal change of the distribution over time. Therefore they have minimal loss in diversity through the course of optimization, provided the initial diversity is high. First experiments using restricted Boltzmann machines confirm this insight. As a simple consequence, IGO seems to provide, from information theory, an elegant way to simultaneously explore several valleys of a fitness landscape in a single run.
search space: X
objective function$f: X \to R$
black-box senario: no knowledge about f other than finding f(x) is assumed
invariance
For continuous search spaces, invariance under translation of the coordinate system is standard in optimization.
Invariance under general affine-linear changes of the coordinates has been
These are invariants to the search space
On the other hand, monotonically increasing transformations for the objective function
iteratively update a probability distribution $𝑃_\theta$ defined on 𝑋, parametrized by a set of parameters $\theta$
estimation of distribution algorithms (EDA)
Euler time discretization
Maybe the place with (1) is off by one.
2.2.1
This page is auto-translated from [/nishio/Information-Geometric Optimization](https://scrapbox.io/nishio/Information-Geometric Optimization) using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.