Fast and Robust Hand Tracking Using Detection-Guided Optimization

Sridhar, Srinath; Mueller, Franziska; Oulasvirta, Antti; Theobalt, Christian

Item

ITEM ACTIONSEXPORT

Add to Basket

Please note that a newer version of this item is available:
https://pure.mpg.de/pubman/item/item_2351922_4

DetailsSummary

Released

Paper

Fast and Robust Hand Tracking Using Detection-Guided Optimization

MPS-Authors

/persons/resource/persons79499

Sridhar, Srinath
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons134216

Mueller, Franziska
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons45610

Theobalt, Christian
Computer Graphics, MPI for Informatics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

arXiv:1602.04124.pdf
(Preprint), 4MB

Supplementary Material (public)

There is no public supplementary material available

Citation

Sridhar, S., Mueller, F., Oulasvirta, A., & Theobalt, C. (2016). Fast and Robust Hand Tracking Using Detection-Guided Optimization. Retrieved from http://arxiv.org/abs/1602.04124.

Cite as: https://hdl.handle.net/11858/00-001M-0000-002B-9A76-9

Abstract

Markerless tracking of hands and fingers is a promising enabler for human-computer interaction. However, adoption has been limited because of tracking inaccuracies, incomplete coverage of motions, low framerate, complex camera setups, and high computational requirements. In this paper, we present a fast method for accurately tracking rapid and complex articulations of the hand using a single depth camera. Our algorithm uses a novel detection-guided optimization strategy that increases the robustness and speed of pose estimation. In the detection step, a randomized decision forest classifies pixels into parts of the hand. In the optimization step, a novel objective function combines the detected part labels and a Gaussian mixture representation of the depth to estimate a pose that best fits the depth. Our approach needs comparably less computational resources which makes it extremely fast (50 fps without GPU support). The approach also supports varying static, or moving, camera-to-scene arrangements. We show the benefits of our method by evaluating on public datasets and comparing against previous work.