Combining appearance and motion for human action classification in videos

Dhillon, PS; Nowozin, S; Lampert, C

doi:10.1109/CVPR.2009.5204237

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

タグ情報を表示リリース履歴を表示詳細要約

公開

会議論文

Combining appearance and motion for human action classification in videos

MPS-Authors

/persons/resource/persons84113

Nowozin, S
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons84037

Lampert, C
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5204237
(出版社版)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

フルテキスト (公開)

公開されているフルテキストはありません

付随資料 (公開)

There is no public supplementary material available

引用

Dhillon, P., Nowozin, S., & Lampert, C. (2009). Combining appearance and motion for human action classification in videos. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (pp. 22-29). Piscataway, NJ, USA: IEEE Service Center.

引用: https://hdl.handle.net/11858/00-001M-0000-0013-C473-3

要旨

An important cue to high level scene understanding is to analyze the objects in the scene and their behavior and interactions. In this paper, we study the problem of classification of activities in videos, as this is an integral component of any scene understanding system, and present a novel approach for recognizing human action categories in videos by combining information from appearance and motion of human body parts. Our approach is based on tracking human body parts by using mixture particle filters and then clustering the particles using local non - parametric clustering, hence associating a local set of particles to each cluster mode. The trajectory of these cluster modes provides the “motion” information and the “appearance” information is provided by the statistical information about the relative motion of these local set of particles over a number of frames. Later we use a “Bag of Words” model to build one histogram per video sequence from the set of these robust appearance and motion descriptors. These histograms provide us characteristic information which helps us to discriminate among various human actions which ultimately helps us in better understanding of the complete scene. We tested our approach on the standard KTH and Weizmann human action datasets and the results were comparable to the state of the art methods. Additionally our approach is able to distinguish between activities that involve the motion of complete body from those in which only certain body parts move. In other words, our method discriminates well between activities with “global body motion” like running, jogging etc. and “local motion” like waving, boxing etc.