English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
DownloadE-Mail
  Using Restrictive Classification and Meta Classification for Junk Elimination

Siersdorfer, S., & Weikum, G. (2005). Using Restrictive Classification and Meta Classification for Junk Elimination. In Advances in information retrieval: 27th European Conference on IR Research, ECIR 2005 (pp. 287-299). Berlin, Germany: Springer.

Item is

Files

show Files
hide Files
:
SiersdorferW05.pdf (Any fulltext), 193KB
 
File Permalink:
-
Name:
SiersdorferW05.pdf
Description:
-
OA-Status:
Visibility:
Private
MIME-Type / Checksum:
application/pdf
Technical Metadata:
Copyright Date:
-
Copyright Info:
-
License:
-

Locators

show

Creators

show
hide
 Creators:
Siersdorfer, Stefan1, Author           
Weikum, Gerhard1, Author           
Losada, David, Editor
Fernández-Luna, Juan M., Editor
Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018              

Content

show
hide
Free keywords: -
 Abstract: This paper addresses the problem of performing supervised classification on document collections containing also junk documents. With junk documents we mean documents that do not belong to the topic categories (classes) we are interested in. This type of documents can typically not be covered by the training set; nevertheless in many real world applications (e.g. classification of web or intranet content, focused crawling etc.) such documents occur quite often and a classifier has to make a decision about them. We tackle this problem by using restrictive methods and ensemble-based meta methods that may decide to leave out some documents rather than assigning them to inappropriate classes with low confidence. Our experiments with four different data sets show that the proposed techniques can eliminate a relatively large fraction of junk documents while dismissing only a significantly smaller fraction of potentially interesting documents.

Details

show
hide
Language(s): eng - English
 Dates: 2006-01-202005
 Publication Status: Issued
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: eDoc: 278888
Other: Local-ID: C1256DBF005F876D-E0BA01CD4D8BA808C1256F8E0061D319-SiersdorferW05
 Degree: -

Event

show
hide
Title: Untitled Event
Place of Event: Santiago de Compostela, Spain
Start-/End Date: 2005-03-21

Legal Case

show

Project information

show

Source 1

show
hide
Title: Advances in information retrieval : 27th European Conference on IR Research, ECIR 2005
Source Genre: Proceedings
 Creator(s):
Affiliations:
Publ. Info: Berlin, Germany : Springer
Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 287 - 299 Identifier: ISBN: 3-540-25295-9

Source 2

show
hide
Title: Lecture Notes in Computer Science
Source Genre: Series
 Creator(s):
Affiliations:
Publ. Info: -
Pages: - Volume / Issue: 3408 Sequence Number: - Start / End Page: - Identifier: -