de.mpg.escidoc.pubman.appbase.FacesBean
English
 
Help Guide Disclaimer Contact us Login
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

Restrictive Clustering and Metaclustering for self-organizing Document Collections

MPS-Authors
http://pubman.mpdl.mpg.de/cone/persons/resource/persons45482

Siersdorfer,  Stefan
Databases and Information Systems, MPI for Informatics, Max Planck Society;

http://pubman.mpdl.mpg.de/cone/persons/resource/persons45500

Sizov,  Sergej
Databases and Information Systems, MPI for Informatics, Max Planck Society;

Locator
There are no locators available
Fulltext (public)
There are no public fulltexts available
Supplementary Material (public)
There is no public supplementary material available
Citation

Siersdorfer, S., & Sizov, S. (2004). Restrictive Clustering and Metaclustering for self-organizing Document Collections. In Proceedings of SIGIR 2004: the Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 226-233). New York, USA: ACM.


Cite as: http://hdl.handle.net/11858/00-001M-0000-000F-2B27-F
Abstract
This paper addresses the problem of automatically structuring heterogenous document collections by using clustering methods. In contrast to traditional clustering, we study restrictive methods and ensemble-based meta methods that may decide to leave out some documents rather than assigning them to inappropriate clusters with low confidence. These techniques result in higher cluster purity, better overall accuracy, and make unsupervised self-organization more robust. Our comprehensive experimental studies on three different real-world data collections demonstrate these benefits. The proposed methods seem particularly suitable for automatically substructuring personal email folders or personal Web directories that are populated by focused crawlers, and they can be combined with supervised classification techniques.