Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Meeting Abstract

Extending DeepGaze II: Scanpath prediction from deep features

MPG-Autoren
/persons/resource/persons192606

Kümmerer,  M
Research Group Computational Vision and Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons83805

Bethge,  M
Max Planck Institute for Biological Cybernetics, Max Planck Society;
Research Group Computational Vision and Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society;

Volltexte (beschränkter Zugriff)
Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.
Volltexte (frei zugänglich)
Es sind keine frei zugänglichen Volltexte in PuRe verfügbar
Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Kümmerer, M., Wallis, T., & Bethge, M. (2018). Extending DeepGaze II: Scanpath prediction from deep features. In 18th Annual Meeting of the Vision Sciences Society (VSS 2018) (pp. 105-106).


Zitierlink: https://hdl.handle.net/21.11116/0000-0001-7E3D-F
Zusammenfassung
Predicting where humans choose to fixate can help understanding a variety of human behaviour. The last years have seen substantial progress in predicting spatial fixation distributions when viewing static images. Our own model "DeepGaze II" (Kümmerer et al., ICCV 2017) extracts pretrained deep neural network features from the VGG network from input images and uses a simple pixelwise readout network to predict fixation distributions from these features. DeepGaze II is state-of-the-art for predicting freeviewing fixation densities according to the established MIT Saliency Benchmark. However, DeepGaze II predicts only spatial fixation distributions instead of scanpaths. Therefore, the models model ignores crucial structure in the fixation selection process. Here we extend DeepGaze II to predict fixation densities conditioned on the previous scanpath. We add additional feature maps encoding the previous scanpath (e.g. the distance of image pixels to previous fixations) to the input of the readout network. Except for these few additional feature maps, the architecture is exactly as for DeepGaze II. The model is trained on ground truth human fixation data (MIT1003) using maximum-likelihood optimization. Even using only the last fixation location increases performance by approximately 30 relative to DeepGaze II and reproduces the strong spatial fixation clustering effect reported previously (Engbert et al., JoV 2015). This contradicts the way Inhibition of Return has often been used in computational models of fixation selection. Using a history of two fixations increases performance further and learns a suppression effect around the earlier fixation location. Due to the probabilistic nature of our model, we can sample new scanpaths from the model that capture the statistics of human scanpaths much better than scanpaths sampled from a purely spatial distribution. The modular architecture of our model allows us to explore the effects of many different possible factors on fixation selection.