hide
Free keywords:
Cognition , Design automation , Detectors , Estimation , Optimization , Solid modeling , Training
geometry , image representation , object recognition , object tracking , teaching
2D bounding box localization , 3D geometric reasoning , 3D geometry teaching , 3D object tracking , 3D scene understanding , Pascal VOC , benchmark data set , deformable part model , higher-level application , individual object detection , object class detector output , object class recognition system , object hypothesis , representational gap , scene understanding system , ultra-wide baseline matching , viewpoint estimation
Abstract:
Current object class recognition systems typically target 2D bounding box localization, encouraged by benchmark data sets, such as Pascal VOC. While this seems suitable for the detection of individual objects, higher-level applications such as 3D scene understanding or 3D object tracking would benefit from more fine-grained object hypotheses incorporating 3D geometric information, such as viewpoints or the locations of individual parts. In this paper, we help narrowing the representational gap between the ideal input of a scene understanding system and object class detector output, by designing a detector particularly tailored towards 3D geometric reasoning. In particular, we extend the successful discriminatively trained deformable part models to include both estimates of viewpoint and 3D parts that are consistent across viewpoints. We experimentally verify that adding 3D geometric information comes at minimal performance loss w.r.t. 2D bounding box localization, but outperforms prior work in 3D viewpoint estimation and ultra-wide baseline matching.