Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Kernel-based Sensor Fusion with Application to Audio-Visual Voice Activity Detection

Apr 11, 2016

David Dov, Ronen Talmon, Israel Cohen

Figure 1 for Kernel-based Sensor Fusion with Application to Audio-Visual Voice Activity Detection

Figure 2 for Kernel-based Sensor Fusion with Application to Audio-Visual Voice Activity Detection

Figure 3 for Kernel-based Sensor Fusion with Application to Audio-Visual Voice Activity Detection

Figure 4 for Kernel-based Sensor Fusion with Application to Audio-Visual Voice Activity Detection

Share this with someone who'll enjoy it:

Abstract:In this paper, we address the problem of multiple view data fusion in the presence of noise and interferences. Recent studies have approached this problem using kernel methods, by relying particularly on a product of kernels constructed separately for each view. From a graph theory point of view, we analyze this fusion approach in a discrete setting. More specifically, based on a statistical model for the connectivity between data points, we propose an algorithm for the selection of the kernel bandwidth, a parameter, which, as we show, has important implications on the robustness of this fusion approach to interferences. Then, we consider the fusion of audio-visual speech signals measured by a single microphone and by a video camera pointed to the face of the speaker. Specifically, we address the task of voice activity detection, i.e., the detection of speech and non-speech segments, in the presence of structured interferences such as keyboard taps and office noise. We propose an algorithm for voice activity detection based on the audio-visual signal. Simulation results show that the proposed algorithm outperforms competing fusion and voice activity detection approaches. In addition, we demonstrate that a proper selection of the kernel bandwidth indeed leads to improved performance.

View paper on

Share this with someone who'll enjoy it:

Title:Kernel-based Sensor Fusion with Application to Audio-Visual Voice Activity Detection

Paper and Code