Attention is the psychological process by which we focus focal point on certain events or information and ignore others. In general, our awareness is limited resources and we cannot pay attention to all information at the same time. Therefore, our brain deals with information overload situations by focusing on information that is deemed important in the moment.
Note A is defined with query q, key k, and value v as follows
$A(Q, K, V) = \mathrm{softmax}(QK^T)V$
Additive and Intra-product Caution
$A(Q, K, V) = \mathrm{softmax}({QK^T \over \sqrt{d_k}})V$
This page is auto-translated from /nishio/アテンション using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.