While appearance-based gaze estimation methods have traditionally exploited
information encoded solely from the eyes, recent results from a multi-region
method indicated that using the full face image can benefit performance.
Pushing this idea further, we propose an appearance-based method that, in
contrast to a long-standing line of work in computer vision, only takes the
full face image as input. Our method encodes the face image using a
convolutional neural network with spatial weights applied on the feature maps
to flexibly suppress or enhance information in different facial regions.
Through evaluation on the recent MPIIGaze and EYEDIAP gaze estimation datasets,
we show that our full-face method significantly outperforms the state of the
art for both 2D and 3D gaze estimation, achieving improvements of up to 14.3%
on MPIIGaze and 27.7% on EYEDIAP for person-independent 3D gaze estimation. We
further show that this improvement is consistent across different illumination
conditions and gaze directions and particularly pronounced for the most
challenging extreme head poses.