Previous work focused on predicting visual search targets from human
fixations but, in the real world, a specific target is often not known, e.g.
when searching for a present for a friend. In this work we instead study the
problem of predicting the mental picture, i.e. only an abstract idea instead of
a specific target. This task is significantly more challenging given that
mental pictures of the same target category can vary widely depending on
personal biases, and given that characteristic target attributes can often not
be verbalised explicitly. We instead propose to use gaze information as
implicit information on users' mental picture and present a novel gaze pooling
layer to seamlessly integrate semantic and localized fixation information into
a deep image representation. We show that we can robustly predict both the
mental picture's category as well as attributes on a novel dataset containing
fixation data of 14 users searching for targets on a subset of the DeepFahion
dataset. Our results have important implications for future search interfaces
and suggest deep gaze pooling as a general-purpose approach for gaze-supported
computer vision systems.