hide
Free keywords:
-
Abstract:
Abstract\\
It has been shown that learning on high-level visual description or visual
properties of objects, also known as \hlight{attributes}, can play an effective
role in recognition systems. However, there are still two open challenging
problems: First, how to scale the use of attributes to a large number of
categories, and second, how to select attributes for a given recognition task.
\\In this thesis, our focus is on these two open problems. We explore automatic
discovering of attributes from images and text data to scale the applicability
of attributes to large collections of images and text documents. In addition,
we find which attributes are relevant for a recognition task by evaluating the
attributes based on how well they can be distinguished by a recognition system.
\\We deal with these problems by two approaches. In the first approach, which
is based on the work of Berg et al.~\sdcite{Berg:2010:AAD:1886063.1886114}, we
extract attributes from text on the web and rank them on the basis of how well
they can be distinguished using a discriminatively trained SVM.
\\In contrast, the second approach uses a generative technique, namely
\hlight{topic model}, to discover textual and the semantically correlated
visual attributes based on the co-occurrence statistics. To this end, three
different models are proposed which differ in how they leverage text to image
relationships.
\\These two approaches are evaluated both qualitatively and quantitatively. The
qualitative evaluation shows that both approaches can discover
human-understandable attributes. The quantitative evaluation demonstrates the
comparable performance of the mentioned approaches in terms of discovering
discriminative attributes. Furthermore, the generative model can localize all
parts of a visual attribute by multiple patches, whereas the discriminative
model shows only a predominant part of each visual attribute by a single patch.