2019-07-29 探索空間で定義された目的関数をパラメータ空間で定義された関数に変換する際に、従来は一般的に期待値を取る方法が使われれいたが、平均は目的関数の単調増加な関数による写像に対して不変量ではないのでquantileを用いた定義に変えた
http://www.jmlr.org/papers/volume18/14-467/14-467.pdf
We present a canonical way to turn any smooth parametric family of probability distributions on an arbitrary search space 𝑋 into a continuous-time black-box optimization method on 𝑋, the information-geometric optimization (IGO) method.
Invariance as a major design principle keeps the number of arbitrary choices to a minimum.
The resulting IGO flow is the flow of an ordinary differential equation conducting the natural gradient ascent of an adaptive, time-dependent transformation of the objective function.
It makes no particular assumptions on the objective function to be optimized.
The IGO method produces explicit IGO algorithms through time discretization.
It naturally recovers versions of known algorithms and offers a systematic way to derive new ones.
All these algorithms are natural instances of, and unified under, the single information-geometric optimization framework.
The IGO method achieves, thanks to its intrinsic formulation, maximal invariance properties:
Theoretical considerations strongly suggest that IGO algorithms are essentially characterized by a minimal change of the distribution over time. Therefore they have minimal loss in diversity through the course of optimization, provided the initial diversity is high. First experiments using restricted Boltzmann machines confirm this insight. As a simple consequence, IGO seems to provide, from information theory, an elegant way to simultaneously explore several valleys of a fitness landscape in a single run.
search space: X
objective function$f: X \to R$
black-box senario: f(x)を求める以外のfに関する知識は得られないものとする
invariance
For continuous search spaces, invariance under translation of the coordinate system is standard in optimization.
Invariance under general affine-linear changes of the coordinates has been
これらは探索空間に対する不変量
一方、目的関数に対する単調増加な変換
iteratively update a probability distribution $𝑃_\theta$ defined on 𝑋, parametrized by a set of parameters $\theta$
estimation of distribution algorithms (EDA)
Euler time discretization
多分(1)の付いている場所、1つずれてる
2.2.1