<ul>
<li><a href="/en/Theory%20and%20Algorithms%20for%20the%20Bandit%20Problem">Theory and Algorithms for the Bandit Problem</a> p.38 <a href="/en/Thompson%20extraction">Thompson extraction</a></li>
<li>Bayesian estimation] of expected value</li>
<li>Choose the action with the probability of being the maximum expected value of each action (<a href="/en/random%20dither">random dither</a>)</li>
<li>However, instead of doing this &quot;probability of being the maximum expected value&quot; calculation, use the [random-choice algorithm</li>
<li>Since it is Bayesian, a distribution of expected values is obtained. Sampling from this distribution</li>
<li>Select the action that had the largest value as a result of sampling</li>
<li>This will make it possible to &quot;choose that action with the probability of being the maximum expected value.</li>
</ul>
<a href="https://hagino3000.blogspot.com/2015/07/thompson-sampling.html">https://hagino3000.blogspot.com/2015/07/thompson-sampling.html</a>
<a href="https://hagino3000.blogspot.com/2016/12/linear-bandit.html">https://hagino3000.blogspot.com/2016/12/linear-bandit.html</a>
#Reinforcement Learning
<hr>
This page is auto-translated from <a href="https://scrapbox.io/nishio/%E3%83%88%E3%83%B3%E3%83%97%E3%82%BD%E3%83%B3%E3%82%B5%E3%83%B3%E3%83%97%E3%83%AA%E3%83%B3%E3%82%B0">/nishio/トンプソンサンプリング</a> using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at <a href="https://twitter.com/nishio_en">@nishio_en</a>. I&#39;m very happy to spread my thought to non-Japanese readers.

Thompson sampling