de.mpg.escidoc.pubman.appbase.FacesBean
Deutsch
 
Hilfe Wegweiser Impressum Kontakt Einloggen
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Konferenzbeitrag

The gesturer is the speaker

MPG-Autoren
http://pubman.mpdl.mpg.de/cone/persons/resource/persons4454

Gebre,  Binyam Gebrekidan
The Language Archive, MPI for Psycholinguistics, Max Planck Society;

http://pubman.mpdl.mpg.de/cone/persons/resource/persons216

Wittenburg,  Peter
The Language Archive, MPI for Psycholinguistics, Max Planck Society;

Externe Ressourcen
Es sind keine Externen Ressourcen verfügbar
Volltexte (frei zugänglich)

gesturer_is_speaker.pdf
(Verlagsversion), 912KB

Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Gebre, B. G., Wittenburg, P., & Heskes, T. (2013). The gesturer is the speaker. In Proceedings of the 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013) (pp. 3751-3755).


Zitierlink: http://hdl.handle.net/11858/00-001M-0000-0010-766A-1
Zusammenfassung
We present and solve the speaker diarization problem in a novel way. We hypothesize that the gesturer is the speaker and that identifying the gesturer can be taken as identifying the active speaker. We provide evidence in support of the hypothesis from gesture literature and audio-visual synchrony studies. We also present a vision-only diarization algorithm that relies on gestures (i.e. upper body movements). Experiments carried out on 8.9 hours of a publicly available dataset (the AMI meeting data) show that diarization error rates as low as 15% can be achieved.