Automatic Estimation of Lexical Concreteness in 77 Languages

Thompson, Bill; Lupyan, Gary

Item

ITEM ACTIONSEXPORT

Add to Basket

Please note that a newer version of this item is available:
https://pure.mpg.de/pubman/item/item_2622741_5

DetailsSummary

Released

Conference Paper

Automatic Estimation of Lexical Concreteness in 77 Languages

MPS-Authors

/persons/resource/persons214866

Thompson, Bill
Language and Cognition Department, MPI for Psycholinguistics, Max Planck Society;

External Resource

http://mindmodeling.org/cogsci2018/papers/0222/0222.pdf
(Publisher version)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

Thompson_Lupyan_2018.pdf
(Publisher version), 666KB

Supplementary Material (public)

There is no public supplementary material available

Citation

Thompson, B., & Lupyan, G. (2018). Automatic Estimation of Lexical Concreteness in 77 Languages. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 1122-1127). Austin, TX: Cognitive Science Society.

Cite as: https://hdl.handle.net/21.11116/0000-0001-BEA1-3

Abstract

We estimate lexical Concreteness for millions of words across 77 languages. Using a simple regression framework, we combine vector-based models of lexical semantics with experimental norms of Concreteness in English and Dutch. By applying techniques to align vector-based semantics across distinct languages, we compute and release Concreteness estimates at scale in numerous languages for which experimental norms are not currently available. This paper lays out the technique and its efficacy. Although this is a difficult dataset to evaluate immediately, Concreteness estimates computed from English correlate with Dutch experimental norms at $\rho$ = .75 in the vocabulary at large, increasing to $\rho$ = .8 among Nouns. Our predictions also recapitulate attested relationships with word frequency. The approach we describe can be readily applied to numerous lexical measures beyond Concreteness