Seeing with Humans: Gaze-Assisted Neural Image Captioning

Sugano, Yusuke; Bulling, Andreas

Lokale TagsFreigabegeschichteDetailsÜbersicht

Seeing with Humans: Gaze-Assisted Neural Image Captioning

Sugano, Y., & Bulling, A. (2016). Seeing with Humans: Gaze-Assisted Neural Image Captioning. Retrieved from http://arxiv.org/abs/1608.05203.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/11858/00-001M-0000-002B-AC67-2 Versions-Permalink: https://hdl.handle.net/11858/00-001M-0000-002B-AC68-F

Genre: Forschungspapier

Latex : Seeing with Humans: {G}aze-Assisted Neural Image Captioning

Dateien

einblenden: Dateien

ausblenden: Dateien

:

arXiv:1608.05203.pdf (Preprint), 3MB

Öffnen Speichern

Datei-Permalink:
https://hdl.handle.net/11858/00-001M-0000-002B-AC69-D

Name:
arXiv:1608.05203.pdf

Beschreibung:
File downloaded from arXiv at 2016-10-28 11:29

OA-Status:

Sichtbarkeit:
Öffentlich

MIME-Typ / Prüfsumme:
application/pdf / [MD5]

Technische Metadaten:

Öffnen

Copyright Datum:
-

Copyright Info:
-

Lizenz:
http://arxiv.org/help/license

Externe Referenzen

einblenden:

Urheber

einblenden:

ausblenden:

Urheber:
Sugano, Yusuke¹, Autor
Bulling, Andreas¹, Autor

Affiliations:
1Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society, ou_1116547

Inhalt

einblenden:

ausblenden:

Schlagwörter: Computer Science, Computer Vision and Pattern Recognition, cs.CV

Zusammenfassung: Gaze reflects how humans process visual scenes and is therefore increasingly used in computer vision systems. Previous works demonstrated the potential of gaze for object-centric tasks, such as object localization and recognition, but it remains unclear if gaze can also be beneficial for scene-centric tasks, such as image captioning. We present a new perspective on gaze-assisted image captioning by studying the interplay between human gaze and the attention mechanism of deep neural networks. Using a public large-scale gaze dataset, we first assess the relationship between state-of-the-art object and scene recognition models, bottom-up visual saliency, and human gaze. We then propose a novel split attention model for image captioning. Our model integrates human gaze information into an attention-based long short-term memory architecture, and allows the algorithm to allocate attention selectively to both fixated and non-fixated image regions. Through evaluation on the COCO/SALICON datasets we show that our method improves image captioning performance and that gaze can complement machine attention for semantic scene understanding tasks.

Details

einblenden:

ausblenden:

Sprache(n): eng - English

Datum: Erstellt: 2016-08-18Online veröffentlicht: 2016

Publikationsstatus: Online veröffentlicht

Seiten: 8 p.

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: arXiv: 1608.05203
URI: http://arxiv.org/abs/1608.05203
BibTex Citekey: Sugano1608.05203

Art des Abschluß: -

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle