Alert button

"speech": models, code, and papers
Alert button

Post-Training Embedding Alignment for Decoupling Enrollment and Runtime Speaker Recognition Models

Jan 23, 2024
Chenyang Gao, Brecht Desplanques, Chelsea J. -T. Ju, Aman Chadha, Andreas Stolcke

Viaarxiv icon

Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification

Dec 22, 2023
Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu

Figure 1 for Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Figure 2 for Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Figure 3 for Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Figure 4 for Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Viaarxiv icon

MLAAD: The Multi-Language Audio Anti-Spoofing Dataset

Jan 17, 2024
Nicolas M. Müller, Piotr Kawa, Wei Herng Choong, Edresson Casanova, Eren Gölge, Thorsten Müller, Piotr Syga, Philip Sperl, Konstantin Böttinger

Viaarxiv icon

An audio-quality-based multi-strategy approach for target speaker extraction in the MISP 2023 Challenge

Jan 08, 2024
Runduo Han, Xiaopeng Yan, Weiming Xu, Pengcheng Guo, Jiayao Sun, He Wang, Quan Lu, Ning Jiang, Lei Xie

Viaarxiv icon

Acoustic models of Brazilian Portuguese Speech based on Neural Transformers

Dec 14, 2023
Marcelo Matheus Gauy, Marcelo Finger

Viaarxiv icon

Decoding Envelope and Frequency-Following EEG Responses to Continuous Speech Using Deep Neural Networks

Dec 15, 2023
Mike Thornton, Danilo Mandic, Tobias Reichenbach

Figure 1 for Decoding Envelope and Frequency-Following EEG Responses to Continuous Speech Using Deep Neural Networks
Figure 2 for Decoding Envelope and Frequency-Following EEG Responses to Continuous Speech Using Deep Neural Networks
Figure 3 for Decoding Envelope and Frequency-Following EEG Responses to Continuous Speech Using Deep Neural Networks
Figure 4 for Decoding Envelope and Frequency-Following EEG Responses to Continuous Speech Using Deep Neural Networks
Viaarxiv icon

Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor

Jan 23, 2024
Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhong-Qiu Wang, Shinji Watanabe

Viaarxiv icon

Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis

Jan 19, 2024
Prabhav Agrawal, Thilo Koehler, Zhiping Xiu, Prashant Serai, Qing He

Viaarxiv icon

NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

Jan 16, 2024
Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe`er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka

Viaarxiv icon

MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis

Dec 19, 2023
Wenhao Guan, Yishuang Li, Tao Li, Hukai Huang, Feng Wang, Jiayan Lin, Lingyan Huang, Lin Li, Qingyang Hong

Viaarxiv icon