Visualize: Figure1 & 8.
Description: " An example of explanation and reasoning in
VQA. We first extract attributes in the image such as “sit”,
“phone” and “woman.” A caption is also generated to encode the relationship between these attributes, e.g. “woman
sitting on a bench.” Then a reasoning module uses these
explanations to predict an answer “talking on the phone.”"