非表示:
キーワード:
-
要旨:
Localization of multiple active speakers in natural environments with only two
microphones is a challenging problem. Reverberation degrades performance of
speaker localization based exclusively on directional cues. The audio modality
alone has problems with localization accuracy while the video modality alone
has problems with false speaker activity detections. This paper presents an
approach based on audiovisual fusion in two stages. In the first stage, speaker
activity is detected based on the audio-visual fusion which can handle false
lip movements. In the second stage, a Gaussian fusion method is proposed to
integrate the estimates of both modalities. As a consequence, the localization
accuracy and robustness compared to the audio/video modality alone is
significantly increased. Experimental results in various scenarios confirmed
the improved performance of the proposed system.