English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
DownloadE-Mail
  Audio-visual Multiple Active Speaker Localisation in Reverberant Environments

Li, Z., Herfet, T., Grochulla, M. P., & Thormählen, T. (2012). Audio-visual Multiple Active Speaker Localisation in Reverberant Environments. In Proceedings of the 15th International Conference on Digital Audio Effects (DAFx-12) (pp. 1-8). York, UK.

Item is

Files

show Files
hide Files
:
dafx12_submission_29.pdf (Any fulltext), 2MB
Name:
dafx12_submission_29.pdf
Description:
-
OA-Status:
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-
License:
-

Locators

show

Creators

show
hide
 Creators:
Li, Zhao1, Author           
Herfet, Thorsten1, Author
Grochulla, Martin Peter2, Author           
Thormählen, Thorsten2, Author           
Affiliations:
1External Organizations, ou_persistent22              
2Computer Graphics, MPI for Informatics, Max Planck Society, ou_40047              

Content

show
hide
Free keywords: -
 Abstract: Localisation of multiple active speakers in natural environments with only two microphones is a challenging problem. Reverberation degrades the performance of speaker localisation based exclusively on directional cues. This paper presents an approach based on audio-visual fusion. The audio modality performs the multiple speaker localisation using the \em Skeleton method, energy weighting, and precedence effect filtering and weighting. The video modality performs the active speaker detection based on the analysis of the lip region of the detected speakers. The audio modality alone has problems with localisation accuracy, while the video modality alone has problems with false detections. The estimation results of both modalities are represented as probabilities in the azimuth domain. A Gaussian fusion method is proposed to combine the estimates in a late stage. As a consequence, the localisation accuracy and robustness compared to the audio/video modality alone is significantly increased. Experimental results in different scenarios confirmed the improved performance of the proposed method.

Details

show
hide
Language(s): eng - English
 Dates: 2012
 Publication Status: Published online
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: BibTex Citekey: Grochulla2012b
Other: Local-ID: BBAE96044E949959C1257B0C005873B8-Grochulla2012b
 Degree: -

Event

show
hide
Title: 15th International Conference on Digital Audio Effects
Place of Event: York, UK
Start-/End Date: 2012-09-17 - 2012-09-21

Legal Case

show

Project information

show

Source 1

show
hide
Title: Proceedings of the 15th International Conference on Digital Audio Effects (DAFx-12)
  Abbreviation : DAFx 2012
Source Genre: Proceedings
 Creator(s):
Affiliations:
Publ. Info: York, UK
Pages: - Volume / Issue: - Sequence Number: 29 Start / End Page: 1 - 8 Identifier: -