English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Fast logistic regression for text categorization with variable-length n-grams

Ifrim, G., Bakir, G., & Weikum, G. (2008). Fast logistic regression for text categorization with variable-length n-grams. In B. Bing Liu, S. Sarawagi, & Y. Li (Eds.), KDD 2008: proceedings of the 14th ACM KDD International Conference on Knowledge Discovery & Data Mining (pp. 354-362). New York, NY: ACM.

Item is

Files

show Files

Locators

show

Creators

show
hide
 Creators:
Ifrim, Georgiana1, Author           
Bakir, Goekhan, Author
Weikum, Gerhard1, Author           
Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018              

Content

show
hide
Free keywords: -
 Abstract: A common representation used in text categorization is the bag of words model (aka. unigram model). Learning with this particular representation involves typically some preprocessing, e.g. stopwords-removal, stemming. This results in one explicit tokenization of the corpus. In this work, we introduce a logistic regression approach where learning involves automatic tokenization. This allows us to weaken the a-priori required knowledge about the corpus and results in a tokenization with variable-length (word or character) n-grams as basic tokens. We accomplish this by solving logistic regression using gradient ascent in the space of all ngrams. We show that this can be done very efficiently using a branch and bound approach which chooses the maximum gradient ascent direction projected onto a single dimension (i.e., candidate feature). Although the space is very large, our method allows us to investigate variable-length n-gram learning. We demonstrate the efficiency of our approach compared to state-of-the-art classifiers used for text categorization such as cyclic coordinate descent logistic regression and support vector machines.

Details

show
hide
Language(s): eng - English
 Dates: 2009-03-252008
 Publication Status: Issued
 Pages: -
 Publishing info: New York, NY : ACM
 Table of Contents: -
 Rev. Type: -
 Identifiers: eDoc: 428111
DOI: http://doi.acm.org/10.1145/1401890.1401936
URI: http://portal.acm.org/citation.cfm?id=1401936#
Other: Local-ID: C125756E0038A185-233A36CCB8D757B1C12574F700499649-Ifrim:KDD08
 Degree: -

Event

show
hide
Title: Untitled Event
Place of Event: Las Vegas, Nevada, USA
Start-/End Date: 2008-08-24 - 2008-08-27

Legal Case

show

Project information

show

Source 1

show
hide
Title: KDD 2008 : proceedings of the 14th ACM KDD International Conference on Knowledge Discovery & Data Mining
Source Genre: Proceedings
 Creator(s):
Bing Liu, Bing, Editor
Sarawagi, Sunita1, Editor           
Li, Ying, Editor
Affiliations:
1 Algorithms and Complexity, MPI for Informatics, Max Planck Society, ou_24019            
Publ. Info: New York, NY : ACM
Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 354 - 362 Identifier: ISBN: 978-1-60558-193-4