de.mpg.escidoc.pubman.appbase.FacesBean
Deutsch
 
Hilfe Wegweiser Impressum Kontakt Einloggen
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Konferenzbeitrag

Multiple Active Speaker Localization Based on Audio-visual Fusion in Two Stages

MPG-Autoren
http://pubman.mpdl.mpg.de/cone/persons/resource/persons44529

Grochulla,  Martin Peter
Computer Graphics, MPI for Informatics, Max Planck Society;

http://pubman.mpdl.mpg.de/cone/persons/resource/persons45618

Thormählen,  Thorsten
Computer Graphics, MPI for Informatics, Max Planck Society;

Externe Ressourcen
Es sind keine Externen Ressourcen verfügbar
Volltexte (frei zugänglich)
Es sind keine frei zugänglichen Volltexte verfügbar
Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Li, Z., Herfet, T., Grochulla, M. P., & Thormählen, T. (2012). Multiple Active Speaker Localization Based on Audio-visual Fusion in Two Stages. In 2012 IEEE Conference on Multisensor Fusion and Integration for Intelligent Systems (pp. 262-268). Piscataway, NJ: IEEE. doi:10.1109/MFI.2012.6343015.


Zitierlink: http://hdl.handle.net/11858/00-001M-0000-0014-F319-D
Zusammenfassung
Localization of multiple active speakers in natural environments with only two microphones is a challenging problem. Reverberation degrades performance of speaker localization based exclusively on directional cues. The audio modality alone has problems with localization accuracy while the video modality alone has problems with false speaker activity detections. This paper presents an approach based on audiovisual fusion in two stages. In the first stage, speaker activity is detected based on the audio-visual fusion which can handle false lip movements. In the second stage, a Gaussian fusion method is proposed to integrate the estimates of both modalities. As a consequence, the localization accuracy and robustness compared to the audio/video modality alone is significantly increased. Experimental results in various scenarios confirmed the improved performance of the proposed system.