Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text 
Classification

Mavroeidis, Dimitrios; Tsatsaronis, George; Vazirgiannis, Michalis; Theobald, Martin; Weikum, Gerhard; Jorge, Alípio; Torgo, Luís; Brazdil, Pavel; Camacho, Rui; Joao, Gama

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Konferenzbeitrag

Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification

MPG-Autoren

/persons/resource/persons45660

Vazirgiannis, Michalis
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45609

Theobald, Martin
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45720

Weikum, Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

Joao, Gama
Max Planck Society;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Mavroeidis, D., Tsatsaronis, G., Vazirgiannis, M., Theobald, M., & Weikum, G. (2005). Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification. In Knowledge discovery in databases: PKDD 2005: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (pp. 181-192). Berlin, Germany: Springer.

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-000F-2846-E

Zusammenfassung

The introduction of hierarchical thesauri (HT) that contain significant semantic information, has led researchers to investigate their potential for improving performance of the text classification task, extending the traditional “bag of words” representation, incorporating syntactic and semantic relationships among words. In this paper we address this problem by proposing a Word Sense Disambiguation (WSD) approach based on the intuition that word proximity in the document implies proximity also in the HT graph. We argue that the high precision exhibited by our WSD algorithm in various humanly-disambiguated benchmark datasets, is appropriate for the classification task. Moreover, we define a semantic kernel, based on the general concept of GVSM kernels, that captures the semantic relations contained in the hierarchical thesaurus. Finally, we conduct experiments using various corpora achieving a systematic improvement in classification accuracy using the SVM algorithm, especially when the training set is small.