(2.2.3.2) Optimism in face of uncertainty

" is proposed to solve the exploration-exploitation trade-off. Many algorithms in reinforcement learning that solve the tradeoffs follow this principle. (*8)

 that judges that a slot machine that hits with a high probability as a low probability. On the contrary, even if it is a slot machine that does not have a high probability of hit, it is sometimes misunderstood that it is a well-winning slot machine by hitting when they first tried it. Let's call this an 

In the case of optimistic misunderstanding, they play that slot machine again. After several times of play. They find that the probability is not as good as the experience.

They do not have a chance to correct pessimistic misunderstandings, but they have a chance to correct optimistic misunderstanding later. In other words, pessimistic misunderstandings are more severe than optimistic misunderstandings. Therefore, by making judgments optimistic, they are well-balanced. It is the principle of "

*8: Bubeck, S. and Cesa-Bianchi, N., 2012. 

s. Foundations and Trends® in Machine Learning, 5(1), pp.1-122.

The principle of "[optimism in the face of uncertainty]" is proposed to solve the exploration-exploitation trade-off. Many algorithms in reinforcement learning that solve the tradeoffs follow this principle. (*8)

In [(2.2.3) We can not compare magnitude when there is an uncertain factor?], I explained an example of a [pessimistic misunderstanding] that judges that a slot machine that hits with a high probability as a low probability. On the contrary, even if it is a slot machine that does not have a high probability of hit, it is sometimes misunderstood that it is a well-winning slot machine by hitting when they first tried it. Let's call this an [optimistic misunderstanding].

They do not have a chance to correct pessimistic misunderstandings, but they have a chance to correct optimistic misunderstanding later. In other words, pessimistic misunderstandings are more severe than optimistic misunderstandings. Therefore, by making judgments optimistic, they are well-balanced. It is the principle of "[optimism in the face of uncertainty]."

	*8: Bubeck, S. and Cesa-Bianchi, N., 2012. [Regret analysis] of stochastic and nonstochastic [multi-armed bandit problem]s. Foundations and Trends® in Machine Learning, 5(1), pp.1-122.

Related Pages