<ul>
<li>CNN], which could only accept fixed-length input, has been replaced by <a href="/en/attention%20mechanism">attention mechanism</a>, which can accept indefinite-length input.
</li>
<li>Why can we extend to indefinite length?
</li>
<li>The CNN was hard-coded in the form of a matrix to determine which position values were multiplied by what weight, relative to itself
<ul>
<li>So it was necessary to fix in advance how many values to process before and after <img src="https://gyazo.com/adfdef7c11d9c8c05bb40d3be79eefbd/thumb/1000" alt="image">.</li>
</ul>
</li>
<li>The attention mechanism determines what weights to multiply by the value of the
<ul>
<li>So there&#39;s no need to predetermine the number of pieces.
</li>
<li><img src="https://gyazo.com/1902ffd4c16d50ff825b1b2573fdc97e/thumb/1000" alt="image">
</li>
<li>Instead, the value returned by the attention mechanism is the same even if the input columns are shuffled because there is no position information in the simple configuration
</li>
</ul>
</li>
<li><a href="/en/Transformer">Transformer</a> combines [Positional Encoding
<ul>
<li>Embed location information in the input value itself.</li>
<li>Now the attention mechanism can take the place of CNN.</li>
</ul>
</li>
</ul>
<hr>
This page is auto-translated from <a href="https://scrapbox.io/nishio/CNN%E3%81%A8%E8%87%AA%E5%B7%B1%E6%B3%A8%E6%84%8F">/nishio/CNNと自己注意</a> using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at <a href="https://twitter.com/nishio_en">@nishio_en</a>. I&#39;m very happy to spread my thought to non-Japanese readers.

CNN and self-attention