Visualize: Figure 4, 5, 6, 7, 8, 9, 10, 11. Description: "we propose to use layer-wise relevance propagation (LRP) to compute the contribution of each contextual word to arbitrary hidden states in the attention-based encoderdecoder framework. "