hide
Free keywords:
-
Abstract:
Automatic recovery of 3D human pose from monocular image sequences is a
challenging and important research topic with numerous applications. Although
current methods are able to recover 3D pose for a single person in controlled
environments, they are severely challenged by real-world scenarios, such as
crowded street scenes. To address this problem, we propose a three-stage
process building on a number of recent advances. The first stage obtains an
initial estimate of the 2D articulation and viewpoint of the person from single
frames. The second stage allows early data association across frames based on
tracking-by-detection. These two stages successfully accumulate the available
2D image evidence into robust estimates of 2D limb positions over short image
sequences (= tracklets). The third and final stage uses those tracklet-based
estimates as robust image observations to reliably recover 3D pose. We
demonstrate state-of-the-art performance on the HumanEva II benchmark, and also
show the applicability of our approach to articulated 3D tracking in realistic
street conditions.