English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  38Automated Retraining Methods for Document Classification and Their Parameter Tuning

Siersdorfer, S., & Weikum, G. (2005). 38Automated Retraining Methods for Document Classification and Their Parameter Tuning. In Web information systems engineering - WISE 2005: 6th International Conference on Web Information Systems Engineering (pp. 478-486). Berlin, Germany: Springer.

Item is

Files

show Files
hide Files
:
SiersdorferW-WISE05.pdf (Any fulltext), 330KB
 
File Permalink:
-
Name:
SiersdorferW-WISE05.pdf
Description:
-
OA-Status:
Visibility:
Private
MIME-Type / Checksum:
application/pdf
Technical Metadata:
Copyright Date:
-
Copyright Info:
-
License:
-

Locators

show

Creators

show
hide
 Creators:
Siersdorfer, Stefan1, Author           
Weikum, Gerhard1, Author           
Ngu, Anne H. H., Editor
Kitsuregawa, Masaru, Editor
Neuhold, Erich J., Editor
Chung, Jen-Yao, Editor
Sheng, Quan Z., Editor
Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018              

Content

show
hide
Free keywords: -
 Abstract: This paper addresses the problem of semi-supervised classification on document collections using retraining (also called self-training). A possible application is focused Web crawling which may start with very few, manually selected, training documents but can be enhanced by automatically adding initially unlabeled, positively classified Web pages for retraining. Such an approach is by itself not robust and faces tuning problems regarding parameters like the number of selected documents, the number of retraining iterations, and the ratio of positive and negative classified samples used for retraining. The paper develops methods for automatically tuning these parameters, based on predicting the leave-one-out error for a re-trained classifier and avoiding that the classifier is diluted by selecting too many or weak documents for retraining. Our experiments with three different datasets confirm the practical viability of the approach.

Details

show
hide
Language(s): eng - English
 Dates: 2006-01-202005
 Publication Status: Issued
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: eDoc: 278887
Other: Local-ID: C1256DBF005F876D-93705A0E8058ECB5C12570D300438C11-SiersdorferW-WISE05
 Degree: -

Event

show
hide
Title: Untitled Event
Place of Event: New York, USA
Start-/End Date: 2005-11-20

Legal Case

show

Project information

show

Source 1

show
hide
Title: Web information systems engineering - WISE 2005 : 6th International Conference on Web Information Systems Engineering
Source Genre: Proceedings
 Creator(s):
Affiliations:
Publ. Info: Berlin, Germany : Springer
Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 478 - 486 Identifier: ISBN: 3-540-30017-1

Source 2

show
hide
Title: Lecture Notes in Computer Science
Source Genre: Series
 Creator(s):
Affiliations:
Publ. Info: -
Pages: - Volume / Issue: 3806 Sequence Number: - Start / End Page: - Identifier: -