View-based Models of 3D Object Recognition and Class-specific Invariances

Logothetis, NK; Vetter, T; Hurlbert, A; Poggio, T

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

タグ情報を表示リリース履歴を表示詳細要約

公開

報告書

View-based Models of 3D Object Recognition and Class-specific Invariances

MPS-Authors

There are no MPG-Authors in the publication available

External Resource

http://dspace.mit.edu/handle/1721.1/6625
(出版社版)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

フルテキスト (公開)

公開されているフルテキストはありません

付随資料 (公開)

There is no public supplementary material available

引用

Logothetis, N., Vetter, T., Hurlbert, A., & Poggio, T.(1994). View-based Models of 3D Object Recognition and Class-specific Invariances (A.I. Memo 1472). Cambridge, MA, USA: Massachusetts Institute of Technology: Artificial Intelligence Laboratory and Center for Biological and Computational Learning Department of Brain and Cognitive Sciences.

引用: https://hdl.handle.net/11858/00-001M-0000-0013-ED3E-6

要旨

This paper describes the main features of a view-based model of object recognition. The model tries to capture general properties to be expected in a biological architecture for object recognition. The basic module is a regularization network in which each of the hidden units is broadly tuned to a specic view of the object to be recognized. The network output, which may be largely view independent, is first described in terms of some simple simulations. The following renements and details of the basic module are then discussed: (1) some of the units may represent only components of views of the object the
optimal stimulus for the unit, its \center", is eectively a complex feature; (2) the units' properties are consistent with the usual description of cortical neurons as tuned to multidimensional optimal stimuli; (3) in learning to recognize new objects, preexisting centers may be used and modied, but also new centers may be created incrementally so as to provide maximal invariance; (4) modules are part of a hierarchical structure: the output of a network may be used as one of the inputs to another, in this way synthesizing increasingly complex features and templates; (5) in several recognition tasks, in particular at the basic
level, a single center using view-invariant features may be sufficient. Modules of this type can deal with recognition of specic objects, for instance a specic face under various
transformations such as those due to viewpoint and illumination, provided that a sufficient number of
example views of the specic object are available. An architecture for 3D object recognition, however, must cope to some extent even when only a single model view is given. The main contribution of this paper is an outline of a recognition architecture that deals with objects of a nice class undergoing a broad spectrum of transformations due to illumination, pose, expression and so on by exploiting prototypical examples. A nice class of objects is a set of objects with sufficiently similar transformation properties under specic transformations, such as viewpoint transformations. For nice object classes, we discuss two possibilities: (a) class-specic transformations are to be applied to a single model image to generate additional virtual example views, thus allowing some degree of generalization beyond what a single model
view could otherwise provide; (b) class specic, view-invariant features are learned from examples of the
class and used with the novel model image, without an explicit generation of virtual examples.