attention (heed) (Attention) Generalization as of 2018
$\mathrm{Attention}(query, Keys, Values) = \mathrm{Normalize}(F(query, Keys)) \cdot Values$
There are queries and Keys, which are a bunch of multiple keys.
There is a function F that takes a query and Keys as arguments and returns the intensity of attention for each key.
The results are then normalized in some way to sum to 1 to obtain the attention intensity (roughly softmax, see Hard attention mechanism).
Weighted average Values by their attention intensity
schematic
F does not know the number of Key. $F(query, Key)$ does not depend on the shape of Key.
[f(query, key) for key in Keys].2014 view of addition 1409.0473 Neural Machine Translation by Jointly Learning to Align and Translate
By letting the decoder have an attention mechanism, we relieve the encoder from the burden of having to encode all information in the source sentence into a fixedlength vector. With this new approach the information can be spread throughout the sequence of annotations, which can be selectively retrieved by the decoder accordingly.
2015 internal volume caution [1508.04025 Effective Approaches to Attention-based Neural Machine Translation https://arxiv.org/abs/1508. 04025]
Split that query and key are simply inner products of query and key
$Attention(query, Key, Value) = Softmax(query \cdot Key) \cdot Value$
Of course, this inner product is sometimes expressed as a matrix product in some papers.
Related bilinear.
Initially, the attention mechanism was envisioned to be used in combination with RNNs
Store the Encoder's hidden states in the Encoder-Decoder configuration and select from among those hidden states by the attention mechanism
In this configuration, Key and Value come from Encoder and query comes from Decoder.
This type of configuration is called [Source Target Attention
K and V together are called Memory
Synonyms are self-caution (Self-attention)
Old commentary
This page is auto-translated from /nishio/注意機構 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.