Multi-time resolution analysis of speech: Evidence from psychophysics

Chait , M.; Greenberg, S.; Arai, T.; Simon, J. Z.; Poeppel, David

doi:10.3389/fnins.2015.00214

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Bitte beachten Sie, dass eine neuere Version dieses Datensatzes verfügbar ist:
https://pure.mpg.de/pubman/item/item_2321179_6

DetailsÜbersicht

Freigegeben

Zeitschriftenartikel

Multi-time resolution analysis of speech: Evidence from psychophysics

MPG-Autoren

/persons/resource/persons173724

Poeppel, David
Department of Neuroscience, Max Planck Institute for Empirical Aesthetics, Max Planck Society;
New York University New York;
University of Maryland ;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Multi-time resolution analysis of speech.pdf
(Verlagsversion), 839KB

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Chait, M., Greenberg, S., Arai, T., Simon, J. Z., & Poeppel, D. (2015). Multi-time resolution analysis of speech: Evidence from psychophysics. Frontiers in Neuroscience, 9: 214. doi:10.3389/fnins.2015.00214.

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-002B-0E20-5

Zusammenfassung

How speech signals are analyzed and represented remains a foundational challenge both for cognitive science and neuroscience. A growing body of research, employing various behavioral and neurobiological experimental techniques, now points to the perceptual relevance of both phoneme-sized (10-40 Hz modulation frequency) and syllable-sized (2-10 Hz modulation frequency) units in speech processing. However, it is not clear how information associated with such different time scales interacts in a manner relevant for speech perception. We report behavioral experiments on speech intelligibility employing a stimulus that allows us to investigate how distinct temporal modulations in speech are treated separately and whether they are combined. We created sentences in which the slow (~4 Hz; Slow) and rapid (~33 Hz; Shigh) modulations-corresponding to ~250 and ~30 ms, the average duration of syllables and certain phonetic properties, respectively-were selectively extracted. Although Slow and Shigh have low intelligibility when presented separately, dichotic presentation of Shigh with Slow results in supra-additive performance, suggesting a synergistic relationship between low- and high-modulation frequencies. A second experiment desynchronized presentation of the Slow and Shigh signals. Desynchronizing signals relative to one another had no impact on intelligibility when delays were less than ~45 ms. Longer delays resulted in a steep intelligibility decline, providing further evidence of integration or binding of information within restricted temporal windows. Our data suggest that human speech perception uses multi-time resolution processing. Signals are concurrently analyzed on at least two separate time scales, the intermediate representations of these analyses are integrated, and the resulting bound percept has significant consequences for speech intelligibility-a view compatible with recent insights from neuroscience implicating multi-timescale auditory processing.