Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

 
 
DownloadE-Mail
  Fast logistic regression for text categorization with variable-length n-grams

Ifrim, G., Bakir, G., & Weikum, G. (2008). Fast logistic regression for text categorization with variable-length n-grams. In B. Bing Liu, S. Sarawagi, & Y. Li (Eds.), KDD 2008: proceedings of the 14th ACM KDD International Conference on Knowledge Discovery & Data Mining (pp. 354-362). New York, NY: ACM.

Item is

Externe Referenzen

einblenden:

Urheber

einblenden:
ausblenden:
 Urheber:
Ifrim, Georgiana1, Autor           
Bakir, Goekhan, Autor
Weikum, Gerhard1, Autor           
Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018              

Inhalt

einblenden:
ausblenden:
Schlagwörter: -
 Zusammenfassung: A common representation used in text categorization is the bag of words model (aka. unigram model). Learning with this particular representation involves typically some preprocessing, e.g. stopwords-removal, stemming. This results in one explicit tokenization of the corpus. In this work, we introduce a logistic regression approach where learning involves automatic tokenization. This allows us to weaken the a-priori required knowledge about the corpus and results in a tokenization with variable-length (word or character) n-grams as basic tokens. We accomplish this by solving logistic regression using gradient ascent in the space of all ngrams. We show that this can be done very efficiently using a branch and bound approach which chooses the maximum gradient ascent direction projected onto a single dimension (i.e., candidate feature). Although the space is very large, our method allows us to investigate variable-length n-gram learning. We demonstrate the efficiency of our approach compared to state-of-the-art classifiers used for text categorization such as cyclic coordinate descent logistic regression and support vector machines.

Details

einblenden:
ausblenden:
Sprache(n): eng - English
 Datum: 2009-03-252008
 Publikationsstatus: Erschienen
 Seiten: -
 Ort, Verlag, Ausgabe: New York, NY : ACM
 Inhaltsverzeichnis: -
 Art der Begutachtung: -
 Identifikatoren: eDoc: 428111
DOI: http://doi.acm.org/10.1145/1401890.1401936
URI: http://portal.acm.org/citation.cfm?id=1401936#
Anderer: Local-ID: C125756E0038A185-233A36CCB8D757B1C12574F700499649-Ifrim:KDD08
 Art des Abschluß: -

Veranstaltung

einblenden:
ausblenden:
Titel: Untitled Event
Veranstaltungsort: Las Vegas, Nevada, USA
Start-/Enddatum: 2008-08-24 - 2008-08-27

Entscheidung

einblenden:

Projektinformation

einblenden:

Quelle 1

einblenden:
ausblenden:
Titel: KDD 2008 : proceedings of the 14th ACM KDD International Conference on Knowledge Discovery & Data Mining
Genre der Quelle: Konferenzband
 Urheber:
Bing Liu, Bing, Herausgeber
Sarawagi, Sunita1, Herausgeber           
Li, Ying, Herausgeber
Affiliations:
1 Algorithms and Complexity, MPI for Informatics, Max Planck Society, ou_24019            
Ort, Verlag, Ausgabe: New York, NY : ACM
Seiten: - Band / Heft: - Artikelnummer: - Start- / Endseite: 354 - 362 Identifikator: ISBN: 978-1-60558-193-4