Neighborhood Conscious Hypertext Categorization

Angelova, Ralitsa

Lokale TagsFreigabegeschichteDetailsÜbersicht

Neighborhood Conscious Hypertext Categorization

Angelova, R. (2004). Neighborhood Conscious Hypertext Categorization. Master Thesis, Universität des Saarlandes, Saarbrücken.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/11858/00-001M-0000-0027-F483-0 Versions-Permalink: https://hdl.handle.net/11858/00-001M-0000-0027-F484-E

Genre: Hochschulschrift

Dateien

einblenden: Dateien

ausblenden: Dateien

:

Masterarbeit-Ange_Rali_2004.pdf (beliebiger Volltext), 2MB

Datei-Permalink:
-

Name:
Masterarbeit-Ange_Rali_2004.pdf

Beschreibung:
-

OA-Status:

Sichtbarkeit:
Eingeschränkt (Max Planck Institute for Informatics, MSIN; )

MIME-Typ / Prüfsumme:
application/pdf

Technische Metadaten:

Copyright Datum:
-

Copyright Info:
-

Lizenz:
-

Externe Referenzen

einblenden:

Urheber

einblenden:

ausblenden:

Urheber:
Angelova, Ralitsa^{1, 2}, Autor
Weikum, Gerhard¹, Ratgeber

Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018
2International Max Planck Research School, MPI for Informatics, Max Planck Society, Campus E1 4, 66123 Saarbrücken, DE, ou_1116551

Inhalt

einblenden:

ausblenden:

Schlagwörter: -

Zusammenfassung: A fundamental issue in statistics, pattern recognition, and machine learning is that of classification. In a traditional classification problem, we wish to assign one of k labels (or classes) to each of n objects (or documents), in a way that is consistent with some observed data available about that problem. For achieving better classification results, we try to capture the information derived by pairwise realtionships between objects, in particular hyperlinks between web documents. the usage of hyperlinks poses new problems not addressed in the extensive text classification literature. Links contain high quality seantic clues that a purely text-based classifier can not take advantage of. However, exploiting link inoframtion is non-trivial because it is noisy and a naive use of terms in the link neghborhood of a document can degrade accuracy. The problem becomes even harder when only a very small fraction of document labels ar known to the classifier and can be used for training, as it is the case in a real classification scenario. Our work is based on an algorithm proposed by Soumen Chakrabarti and uses the theory of Markov Random Fields to derive a relaxation labelling technique for the class assignment problem. We show that the extra information contaned in the hyperlinks between the documents can be explited to achieve significant improvement in the performance of classification. We implemente our algorithm in Java and ran our experiments on to sets of data obtained from the DBLP and IMDB databases. We oberved up to 5.5 improvement in the accuracy of the classification and up the 10 higher recall and precision resultls.

Details

einblenden:

ausblenden:

Sprache(n): eng - English

Datum: Angenommen: 2004-07Erschienen: 2004

Publikationsstatus: Erschienen

Seiten: -

Ort, Verlag, Ausgabe: Saarbrücken : Universität des Saarlandes

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: BibTex Citekey: Ralitsa2004

Art des Abschluß: Master

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle