Visualize: Figure 5 & 6.
Description: " Hypotheses: Attention (S3), Prediction (S4), or Beam Search (S5)
Error – encoder words and decoder words (E/D), Attention (S3), top k
predictions for each time step in decoder (S4), and beam search tree
(S5)"