English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
DownloadE-Mail
  Seeing with Humans: Gaze-Assisted Neural Image Captioning

Sugano, Y., & Bulling, A. (2016). Seeing with Humans: Gaze-Assisted Neural Image Captioning. Retrieved from http://arxiv.org/abs/1608.05203.

Item is

Basic

show hide
Genre: Paper
Latex : Seeing with Humans: {G}aze-Assisted Neural Image Captioning

Files

show Files
hide Files
:
arXiv:1608.05203.pdf (Preprint), 3MB
Name:
arXiv:1608.05203.pdf
Description:
File downloaded from arXiv at 2016-10-28 11:29
OA-Status:
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-

Locators

show

Creators

show
hide
 Creators:
Sugano, Yusuke1, Author           
Bulling, Andreas1, Author           
Affiliations:
1Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society, ou_1116547              

Content

show
hide
Free keywords: Computer Science, Computer Vision and Pattern Recognition, cs.CV
 Abstract: Gaze reflects how humans process visual scenes and is therefore increasingly used in computer vision systems. Previous works demonstrated the potential of gaze for object-centric tasks, such as object localization and recognition, but it remains unclear if gaze can also be beneficial for scene-centric tasks, such as image captioning. We present a new perspective on gaze-assisted image captioning by studying the interplay between human gaze and the attention mechanism of deep neural networks. Using a public large-scale gaze dataset, we first assess the relationship between state-of-the-art object and scene recognition models, bottom-up visual saliency, and human gaze. We then propose a novel split attention model for image captioning. Our model integrates human gaze information into an attention-based long short-term memory architecture, and allows the algorithm to allocate attention selectively to both fixated and non-fixated image regions. Through evaluation on the COCO/SALICON datasets we show that our method improves image captioning performance and that gaze can complement machine attention for semantic scene understanding tasks.

Details

show
hide
Language(s): eng - English
 Dates: 2016-08-182016
 Publication Status: Published online
 Pages: 8 p.
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: arXiv: 1608.05203
URI: http://arxiv.org/abs/1608.05203
BibTex Citekey: Sugano1608.05203
 Degree: -

Event

show

Legal Case

show

Project information

show

Source

show