hide
Free keywords:
-
Abstract:
Lexical databases following the wordnet paradigm capture information about
words, word senses, and their relationships. A large number of existing tools
and datasets are based on the original WordNet, so extending the landscape of
resources aligned with WordNet leads to great potential for interoperability
and to substantial synergies. Wordnets are being compiled for a considerable
number of languages, however most have yet to reach a comparable level of
coverage. We propose a method for automatically producing such resources for
new languages based on WordNet, and analyse the implications of this approach
both from a linguistic perspective as well as by considering natural language
processing tasks. Our approach takes advantage of the original WordNet in
conjunction with translation dictionaries. A small set of training associations
is used to learn a statistical model for predicting associations between terms
and senses. The associations are represented using a variety of scores that
take into account structural properties as well as semantic relatedness and
corpus frequency information. Although the resulting wordnets are imperfect in
terms of their quality and coverage of language-specific phenomena, we show
that they constitute a cheap and suitable alternative for many applications,
both for monolingual tasks as well as for cross-lingual interoperability. Apart
from analysing the resources directly, we conducted tests on semantic
relatedness assessment and cross-lingual text classification with very
promising results.