de.mpg.escidoc.pubman.appbase.FacesBean
English
 
Help Guide Disclaimer Contact us Login
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

Improving Native Language Identification with TF-IDF weighting

MPS-Authors
http://pubman.mpdl.mpg.de/cone/persons/resource/persons4454

Gebre,  Binyam Gebrekidan
The Language Archive, MPI for Psycholinguistics, Max Planck Society;

http://pubman.mpdl.mpg.de/cone/persons/resource/persons216

Wittenburg,  Peter
The Language Archive, MPI for Psycholinguistics, Max Planck Society;

Locator
There are no locators available
Fulltext (public)

W13-1728.pdf
(Publisher version), 136KB

Supplementary Material (public)
There is no public supplementary material available
Citation

Gebre, B. G., Zampieri, M., Wittenburg, P., & Heskes, T. (2013). Improving Native Language Identification with TF-IDF weighting. In Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 216-223).


Cite as: http://hdl.handle.net/11858/00-001M-0000-000E-FB4D-B
Abstract
This paper presents a Native Language Identification (NLI) system based on TF-IDF weighting schemes and using linear classifiers - support vector machines, logistic regressions and perceptrons. The system was one of the participants of the 2013 NLI Shared Task in the closed-training track, achieving 0.814 overall accuracy for a set of 11 native languages. This accuracy was only 2.2 percentage points lower than the winner's performance. Furthermore, with subsequent evaluations using 10-fold cross-validation (as given by the organizers) on the combined training and development data, the best average accuracy obtained is 0.8455 and the features that contributed to this accuracy are the TF-IDF of the combined unigrams and bigrams of words.