Rohrbach, Anna Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society;
Fukui, A., Park, D. H., Yang, D., Rohrbach, A., Darrell, T., & Rohrbach, M. (2016). Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 457-468). Stroudsburg, PA: ACL. Retrieved from https://aclweb.org/anthology/D/D16/D16-1044.pdf.