Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion

Jun 07, 2021

Baptiste Pouthier, Laurent Pilati, Leela K. Gudupudi, Charles Bouveyron, Frederic Precioso

Figure 1 for Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion

Figure 2 for Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion

Figure 3 for Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion

Figure 4 for Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion

Share this with someone who'll enjoy it:

Abstract:It is now well established from a variety of studies that there is a significant benefit from combining video and audio data in detecting active speakers. However, either of the modalities can potentially mislead audiovisual fusion by inducing unreliable or deceptive information. This paper outlines active speaker detection as a multi-objective learning problem to leverage best of each modalities using a novel self-attention, uncertainty-based multimodal fusion scheme. Results obtained show that the proposed multi-objective learning architecture outperforms traditional approaches in improving both mAP and AUC scores. We further demonstrate that our fusion strategy surpasses, in active speaker detection, other modality fusion methods reported in various disciplines. We finally show that the proposed method significantly improves the state-of-the-art on the AVA-ActiveSpeaker dataset.

* In INTERSPEECH 2021

View paper on

Share this with someone who'll enjoy it:

Title:Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion

Paper and Code