Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Likhith Ayinala

Multimodal Zero-Shot Framework for Deepfake Hate Speech Detection in Low-Resource Languages

Jun 10, 2025

Rishabh Ranjan, Likhith Ayinala, Mayank Vatsa, Richa Singh

Figure 1 for Multimodal Zero-Shot Framework for Deepfake Hate Speech Detection in Low-Resource Languages

Figure 2 for Multimodal Zero-Shot Framework for Deepfake Hate Speech Detection in Low-Resource Languages

Figure 3 for Multimodal Zero-Shot Framework for Deepfake Hate Speech Detection in Low-Resource Languages

Figure 4 for Multimodal Zero-Shot Framework for Deepfake Hate Speech Detection in Low-Resource Languages

Abstract:This paper introduces a novel multimodal framework for hate speech detection in deepfake audio, excelling even in zero-shot scenarios. Unlike previous approaches, our method uses contrastive learning to jointly align audio and text representations across languages. We present the first benchmark dataset with 127,290 paired text and synthesized speech samples in six languages: English and five low-resource Indian languages (Hindi, Bengali, Marathi, Tamil, Telugu). Our model learns a shared semantic embedding space, enabling robust cross-lingual and cross-modal classification. Experiments on two multilingual test sets show our approach outperforms baselines, achieving accuracies of 0.819 and 0.701, and generalizes well to unseen languages. This demonstrates the advantage of combining modalities for hate speech detection in synthetic media, especially in low-resource settings where unimodal models falter. The Dataset is available at https://www.iab-rubric.org/resources.

* Accepted in Interpseech 2025

Via

Access Paper or Ask Questions