Lexical databases are invaluable sources of knowledge about words and
with numerous applications in areas like NLP, IR, and AI.
We propose a methodology for the automatic construction of a large-scale
lexical database where words of many languages are hierarchically
organized in terms of their
meanings and their semantic relations to other words. This resource is
WordNet, a well-known English-language resource. Our approach extends
WordNet with around
1.5 million meaning links for 800,000 words in over 200 languages,
drawing on evidence extracted
from a variety of resources including existing (monolingual) wordnets,
(mostly bilingual) translation
dictionaries, and parallel corpora.
Graph-based scoring functions and statistical learning techniques are
used to iteratively integrate
this information and build an output graph. Experiments show that this
wordnet has a high
level of precision and coverage, and that it can be useful in applied
tasks such as
cross-lingual text classification.