Visualize: Figure 1 & 4 . Description: Attention in Transformer; Attention head view of GPT-2; Visualize: Figure 2. Description: model view of GPT-2; Visualize: Figure 3. Description: neuron view ; Visualize: Figure 5 & 6 & 10-12. Description: Each heatmap shows the proportion of total attention Visualize: Table 1-3; Description: word-level attention;