Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT
  Audio-visual Multiple Active Speaker Localisation in Reverberant Environments

Li, Z., Herfet, T., Grochulla, M. P., & Thormählen, T. (2012). Audio-visual Multiple Active Speaker Localisation in Reverberant Environments. In Proceedings of the 15th International Conference on Digital Audio Effects (DAFx-12) (pp. 1-8). York, UK.

Item is

Dateien

einblenden: Dateien
ausblenden: Dateien
:
dafx12_submission_29.pdf (beliebiger Volltext), 2MB
Name:
dafx12_submission_29.pdf
Beschreibung:
-
OA-Status:
Sichtbarkeit:
Öffentlich
MIME-Typ / Prüfsumme:
application/pdf / [MD5]
Technische Metadaten:
Copyright Datum:
-
Copyright Info:
-
Lizenz:
-

Externe Referenzen

einblenden:

Urheber

einblenden:
ausblenden:
 Urheber:
Li, Zhao1, Autor           
Herfet, Thorsten1, Autor
Grochulla, Martin Peter2, Autor           
Thormählen, Thorsten2, Autor           
Affiliations:
1External Organizations, ou_persistent22              
2Computer Graphics, MPI for Informatics, Max Planck Society, ou_40047              

Inhalt

einblenden:
ausblenden:
Schlagwörter: -
 Zusammenfassung: Localisation of multiple active speakers in natural environments with only two microphones is a challenging problem. Reverberation degrades the performance of speaker localisation based exclusively on directional cues. This paper presents an approach based on audio-visual fusion. The audio modality performs the multiple speaker localisation using the \em Skeleton method, energy weighting, and precedence effect filtering and weighting. The video modality performs the active speaker detection based on the analysis of the lip region of the detected speakers. The audio modality alone has problems with localisation accuracy, while the video modality alone has problems with false detections. The estimation results of both modalities are represented as probabilities in the azimuth domain. A Gaussian fusion method is proposed to combine the estimates in a late stage. As a consequence, the localisation accuracy and robustness compared to the audio/video modality alone is significantly increased. Experimental results in different scenarios confirmed the improved performance of the proposed method.

Details

einblenden:
ausblenden:
Sprache(n): eng - English
 Datum: 2012
 Publikationsstatus: Online veröffentlicht
 Seiten: -
 Ort, Verlag, Ausgabe: -
 Inhaltsverzeichnis: -
 Art der Begutachtung: -
 Identifikatoren: BibTex Citekey: Grochulla2012b
Anderer: Local-ID: BBAE96044E949959C1257B0C005873B8-Grochulla2012b
 Art des Abschluß: -

Veranstaltung

einblenden:
ausblenden:
Titel: 15th International Conference on Digital Audio Effects
Veranstaltungsort: York, UK
Start-/Enddatum: 2012-09-17 - 2012-09-21

Entscheidung

einblenden:

Projektinformation

einblenden:

Quelle 1

einblenden:
ausblenden:
Titel: Proceedings of the 15th International Conference on Digital Audio Effects (DAFx-12)
  Kurztitel : DAFx 2012
Genre der Quelle: Konferenzband
 Urheber:
Affiliations:
Ort, Verlag, Ausgabe: York, UK
Seiten: - Band / Heft: - Artikelnummer: 29 Start- / Endseite: 1 - 8 Identifikator: -