Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor

Mueller, Franziska; Mehta, Dushyant; Sotnychenko, Oleksandr; Sridhar, Srinath; Casas, Dan; Theobalt, Christian

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Bitte beachten Sie, dass eine neuere Version dieses Datensatzes verfügbar ist:
https://pure.mpg.de/pubman/item/item_2460791_2

DetailsÜbersicht

Freigegeben

Forschungspapier

Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor

MPG-Autoren

/persons/resource/persons134216

Mueller, Franziska
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons129023

Mehta, Dushyant
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons199773

Sotnychenko, Oleksandr
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons79499

Sridhar, Srinath
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons45610

Theobalt, Christian
Computer Graphics, MPI for Informatics, Max Planck Society;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

arXiv:1704.02201.pdf
(Preprint), 7MB

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., & Theobalt, C. (2017). Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor. Retrieved from http://arxiv.org/abs/1704.02201.

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-002D-8BBD-F

Zusammenfassung

We present an approach for real-time, robust and accurate hand pose estimation from moving egocentric RGB-D cameras in cluttered real environments. Existing methods typically fail for hand-object interactions in cluttered scenes imaged from egocentric viewpoints, common for virtual or augmented reality applications. Our approach uses two subsequently applied Convolutional Neural Networks (CNNs) to localize the hand and regress 3D joint locations. Hand localization is achieved by using a CNN to estimate the 2D position of the hand center in the input, even in the presence of clutter and occlusions. The localized hand position, together with the corresponding input depth value, is used to generate a normalized cropped image that is fed into a second CNN to regress relative 3D hand joint locations in real time. For added accuracy, robustness and temporal stability, we refine the pose estimates using a kinematic pose tracking energy. To train the CNNs, we introduce a new photorealistic dataset that uses a merged reality approach to capture and synthesize large amounts of annotated data of natural hand interaction in cluttered scenes. Through quantitative and qualitative evaluation, we show that our method is robust to self-occlusion and occlusions by objects, particularly in moving egocentric perspectives.