Different types of Attention
\(s_t\) and \(h_i\) are source hidden states and target hidden state, the shape is (n,1)
. \(c_t\) is the final context vector, and \(\alpha_{t,s}\) is alignment score.
\[\begin{aligned} c_t&=\sum_{i=1}^n \alpha_{t,s}h_i \\ \alpha_{t,s}&= \frac{\exp(score(s_t,h_i))}{\sum_{i=1}^n \exp(score(s_t,h_i))} \end{aligned}\]
Global(Soft) VS Local(Hard) #
Global Attention takes all source hidden states into account, and local attention only use part of the source hidden states.