Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

 
 
DownloadE-Mail
  Ask Your Neurons: A Deep Learning Approach to Visual Question Answering

Malinowski, M., Rohrbach, M., & Fritz, M. (2016). Ask Your Neurons: A Deep Learning Approach to Visual Question Answering. Retrieved from http://arxiv.org/abs/1605.02697.

Item is

Basisdaten

einblenden: ausblenden:
Genre: Forschungspapier
Latex : Ask Your Neurons: {A} Deep Learning Approach to Visual Question Answering

Dateien

einblenden: Dateien
ausblenden: Dateien
:
arXiv:1605.02697.pdf (Preprint), 6MB
Name:
arXiv:1605.02697.pdf
Beschreibung:
File downloaded from arXiv at 2016-07-15 12:38
OA-Status:
Sichtbarkeit:
Öffentlich
MIME-Typ / Prüfsumme:
application/pdf / [MD5]
Technische Metadaten:
Copyright Datum:
-
Copyright Info:
-

Externe Referenzen

einblenden:

Urheber

einblenden:
ausblenden:
 Urheber:
Malinowski, Mateusz1, Autor           
Rohrbach, Marcus2, Autor           
Fritz, Mario1, Autor           
Affiliations:
1Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society, ou_1116547              
2External Organizations, ou_persistent22              

Inhalt

einblenden:
ausblenden:
Schlagwörter: Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Computation and Language, cs.CL
 Zusammenfassung: We address a question answering task on real-world images that is set up as a Visual Turing Test. By combining latest advances in image representation and natural language processing, we propose Ask Your Neurons, a scalable, jointly trained, end-to-end formulation to this problem. In contrast to previous efforts, we are facing a multi-modal problem where the language output (answer) is conditioned on visual and natural language inputs (image and question). We provide additional insights into the problem by analyzing how much information is contained only in the language part for which we provide a new human baseline. To study human consensus, which is related to the ambiguities inherent in this challenging task, we propose two novel metrics and collect additional answers which extend the original DAQUAR dataset to DAQUAR-Consensus. Moreover, we also extend our analysis to VQA, a large-scale question answering about images dataset, where we investigate some particular design choices and show the importance of stronger visual models. At the same time, we achieve strong performance of our model that still uses a global image representation. Finally, based on such analysis, we refine our Ask Your Neurons on DAQUAR, which also leads to a better performance on this challenging task.

Details

einblenden:
ausblenden:
Sprache(n): eng - English
 Datum: 2016-05-092016
 Publikationsstatus: Online veröffentlicht
 Seiten: 24 p.
 Ort, Verlag, Ausgabe: -
 Inhaltsverzeichnis: -
 Art der Begutachtung: -
 Identifikatoren: arXiv: 1605.02697
URI: http://arxiv.org/abs/1605.02697
BibTex Citekey: Malinowski1605.02697
 Art des Abschluß: -

Veranstaltung

einblenden:

Entscheidung

einblenden:

Projektinformation

einblenden:

Quelle

einblenden: