Help Guide Disclaimer Contact us Login
  Advanced SearchBrowse




Journal Article

Distributions of cognates in Europe as based on Levenshtein distance


Schepens,  Job
Centre for Language Studies , Radboud University Nijmegen, NL;
International Max Planck Research School for Language Sciences, MPI for Psycholinguistics, Max Planck Society, Nijmegen, NL;

There are no locators available
Fulltext (public)

(Publisher version), 557KB

Supplementary Material (public)
There is no public supplementary material available

Schepens, J., Dijksta, T., & Grootjen, F. (2012). Distributions of cognates in Europe as based on Levenshtein distance. Bilingualism: Language and Cognition, 15(SI ), 157-166. doi:10.1017/S1366728910000623.

Cite as:
Researchers on bilingual processing can benefit from computational tools developed in artificial intelligence. We show that a normalized Levenshtein distance function can efficiently and reliably simulate bilingual orthographic similarity ratings. Orthographic similarity distributions of cognates and non-cognates were identified across pairs of six European languages: English, German, French, Spanish, Italian, and Dutch. Semantic equivalence was determined using the conceptual structure of a translation database. By using a similarity threshold, large numbers of cognates could be selected that nearly completely included the stimulus materials of experimental studies. The identified numbers of form-similar and identical cognates correlated highly with branch lengths of phylogenetic language family trees, supporting the usefulness of the new measure for cross-language comparison. The normalized Levenshtein distance function can be considered as a new formal model of cross-language orthographic similarity.