Visualize: Figure 2. Description: "DIFFMASK learns to maskout subsets of the input while maintaining differentiability. The decision to include or disregard an input token is made with a simple model based on intermediate hidden layers of the analyzed model. "