ausblenden:
Schlagwörter:
-
Zusammenfassung:
The introduction of hierarchical thesauri (HT) that contain significant
semantic information, has led researchers to investigate their potential for
improving performance of the text classification task, extending the
traditional “bag of words” representation, incorporating syntactic and semantic
relationships among words. In this paper we address this problem by proposing a
Word Sense Disambiguation (WSD) approach based on the intuition that word
proximity in the document implies proximity also in the HT graph. We argue that
the high precision exhibited by our WSD algorithm in various
humanly-disambiguated benchmark datasets, is appropriate for the classification
task. Moreover, we define a semantic kernel, based on the general concept of
GVSM kernels, that captures the semantic relations contained in the
hierarchical thesaurus. Finally, we conduct experiments using various corpora
achieving a systematic improvement in classification accuracy using the SVM
algorithm, especially when the training set is small.