This master's thesis investigates the benefit of utilizing depth
information acquired by a time-of-flight (ToF) camera for hand shape
recognition from unrestricted viewpoints. Specifically, we assess the
hypothesis that classical 3D content descriptors might be
inappropriate for ToF depth images due to the 2.5D nature and
noisiness of the data and possible expensive computations in 3D space.
Instead, we extend 2D descriptors to make use of the additional
semantics of depth images. Our system is based on the appearance-based
retrieval paradigm, using a synthetic 3D hand model to generate its
database. The system is able to run at interactive frame rates. For
increased robustness, no color, intensity, or time coherence
information is used. A novel, domain-specific algorithm for segmenting
the forearm from the upper body based on reprojecting the acquired
geometry into the lateral view is introduced. Moreover, three kinds of
descriptors exploiting depth data are proposed and the made design
choices are experimentally supported. The whole system is then
evaluated on an American sign language fingerspelling dataset.
However, the retrieval performance still leaves room for improvements.
Several insights and possible reasons are discussed.