Despite significant recent advances in image classification, fine-grained
classification remains a challenge. In the present paper, we address the
zero-shot and few-shot learning scenarios as obtaining labeled data is
especially difficult for fine-grained classification tasks. First, we embed
state-of-the-art image descriptors in a label embedding space using side
information such as attributes. We argue that learning a joint embedding space,
that maximizes the compatibility between the input and output embeddings, is
highly effective for zero/few-shot learning. We show empirically that such
embeddings significantly outperforms the current state-of-the-art methods on
two challenging datasets (Caltech-UCSD Birds and Animals with Attributes).
Second, to reduce the amount of costly manual attribute annotations, we use
alternate output embeddings based on the word-vector representations, obtained
from large text-corpora without any supervision. We report that such
unsupervised embeddings achieve encouraging results, and lead to further
improvements when combined with the supervised ones.