Loss Surface / Optimization Trajectory: Visualize: Figure 1, Figure 3 & 4 & 6 & 7. Description: "Training loss surfaces of training from scratch (top) and fine-tuning BERT (bottom) on four datasets. We annotate the start point (i.e., initialized model) and the end point (i.e., estimated model) in the loss surfaces"