Visualize: Figure 6. Description: "BERT attention heads embedded in twodimensional space. Distance between points approximately matches the average Jensen-Shannon divergences between the outputs of the corresponding heads. Heads in the same layer tend to be close together."