Visualize: Figure1 & 8. Description: " An example of explanation and reasoning in VQA. We first extract attributes in the image such as “sit”, “phone” and “woman.” A caption is also generated to encode the relationship between these attributes, e.g. “woman sitting on a bench.” Then a reasoning module uses these explanations to predict an answer “talking on the phone.”"