Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hemant A. Patil

SingFox: A Multi-Lingual Singfake Detection Corpus

Jun 17, 2026

Arth J. Shah, Devanshi K. Trivedi, Himanshi U. Borad, Hemant A. Patil

Abstract:In this work, we introduce SingFox, a comprehensive and large-scale dataset specifically designed to support robust evaluation of singing deepfake detection and source tracing systems. SingFox is divided into six distinct tracks (T1--T6), each targeting a unique form of novelty, ranging from language diversity (global and Indian) to genre-specific music and alternative fake generation methods. The dataset encompasses over 113,802 audio clips across 20 languages, totaling more than 126.32 hours of audio data and featuring 1,150 singers. Each track is designed to emulate real-world scenarios and evaluate how reliably models perform under different conditions, thereby assessing their robustness. SingFox aims to foster reproducibility and accelerate research in singing deepfake detection by providing a reliable benchmark for both the singfake detection task and the source verification task (model explainability). Experimental results show a highest accuracy of 77.84\% in cross-dataset evaluation settings. All code and resources required to reproduce the dataset are publicly available at https://github.com/Arth-Shah/SingFox.

* Accepted at INTERSPEECH 2026

Via

Access Paper or Ask Questions

CinC-GAN for Effective F0 prediction for Whisper-to-Normal Speech Conversion

Aug 18, 2020

Maitreya Patel, Mirali Purohit, Jui Shah, Hemant A. Patil

Figure 1 for CinC-GAN for Effective F0 prediction for Whisper-to-Normal Speech Conversion

Figure 2 for CinC-GAN for Effective F0 prediction for Whisper-to-Normal Speech Conversion

Figure 3 for CinC-GAN for Effective F0 prediction for Whisper-to-Normal Speech Conversion

Figure 4 for CinC-GAN for Effective F0 prediction for Whisper-to-Normal Speech Conversion

Abstract:Recently, Generative Adversarial Networks (GAN)-based methods have shown remarkable performance for the Voice Conversion and WHiSPer-to-normal SPeeCH (WHSP2SPCH) conversion. One of the key challenges in WHSP2SPCH conversion is the prediction of fundamental frequency (F0). Recently, authors have proposed state-of-the-art method Cycle-Consistent Generative Adversarial Networks (CycleGAN) for WHSP2SPCH conversion. The CycleGAN-based method uses two different models, one for Mel Cepstral Coefficients (MCC) mapping, and another for F0 prediction, where F0 is highly dependent on the pre-trained model of MCC mapping. This leads to additional non-linear noise in predicted F0. To suppress this noise, we propose Cycle-in-Cycle GAN (i.e., CinC-GAN). It is specially designed to increase the effectiveness in F0 prediction without losing the accuracy of MCC mapping. We evaluated the proposed method on a non-parallel setting and analyzed on speaker-specific, and gender-specific tasks. The objective and subjective tests show that CinC-GAN significantly outperforms the CycleGAN. In addition, we analyze the CycleGAN and CinC-GAN for unseen speakers and the results show the clear superiority of CinC-GAN.

* Accepted in 28th European Signal Processing Conference (EUSIPCO), 2020

Via

Access Paper or Ask Questions