ausblenden:
Schlagwörter:
-
Zusammenfassung:
In this study we present a biologically motivated learning-based computer vision approach to
human pose estimation and tracking in clutter. The approach consists of two interconnected
modules: human posture estimation from monocular images and tracking a person’s location in
video footage. Full body pose estimation is approached with methods from statistical learning
theory: A mapping from biologically plausible complex features (similar to [1]) into a pose
space is learned using kernel based techniques (i.e. Support Vector Machines and kernel ridge
regression). The pose space is derived from a human body model based on 3D joint positions.
To tackle the ambiguities inherent to the projection of a 3D scene onto a monocular image
our approach employs a one-to-many mapping scheme which maps, in a mixture-of-experts
fashion [2], to several possible 3D poses. A key feature of the presented framework is the
feedback matching pathway which evaluates the likelihood of a generated hypothesis in an
intermediate feature space based on a robust medial axis transformation. The approach of [3]
is hereby extended to clutter. The fusion of bottom-up and top-down techniques exploits the
advantages of both approaches by being able to generate multiple hypotheses fast in a feedforward
manner without losing the ability to evaluate the hypotheses in the original image
space.
Tracking is investigated as the problem of finding a bounding box of a person throughout a
video sequence taking into account possible shape deformations. Based on the ability to track
a person a temporal filtering framework with constraints of natural movement is employed to
further disambiguate several hypotheses and to arrive at a stable and robust pose estimate.
To generate the needed amount of training images with corresponding ground-truth pose
information we use realistic computer graphics models driven by motion capture data embedded
into clutter by alpha-blending. Overall, we explore the robustness of our framework against
background changes and its generalization capabilities to novel actors, actions and real world
imagery.