Alert button

"speech": models, code, and papers
Alert button

Singer Identity Representation Learning using Self-Supervised Techniques

Jan 10, 2024
Bernardo Torres, Stefan Lattner, Gaël Richard

Viaarxiv icon

MuTox: Universal MUltilingual Audio-based TOXicity Dataset and Zero-shot Detector

Jan 10, 2024
Marta R. Costa-jussà, Mariano Coria Meglioli, Pierre Andrews, David Dale, Prangthip Hansanti, Elahe Kalbassi, Alex Mourachko, Christophe Ropers, Carleigh Wood

Viaarxiv icon

Efficient speech detection in environmental audio using acoustic recognition and knowledge distillation

Dec 14, 2023
Drew Priebe, Burooj Ghani, Dan Stowell

Viaarxiv icon

Low-latency Speech Enhancement via Speech Token Generation

Add code
Bookmark button
Alert button
Oct 20, 2023
Huaying Xue, Xiulian Peng, Yan Lu

Figure 1 for Low-latency Speech Enhancement via Speech Token Generation
Figure 2 for Low-latency Speech Enhancement via Speech Token Generation
Figure 3 for Low-latency Speech Enhancement via Speech Token Generation
Figure 4 for Low-latency Speech Enhancement via Speech Token Generation
Viaarxiv icon

Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios

Jan 08, 2024
Tobias Cord-Landwehr, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla, Reinhold Haeb-Umbach

Viaarxiv icon

The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge

Dec 26, 2023
Meng Ge, Yizhou Peng, Yidi Jiang, Jingru Lin, Junyi Ao, Mehmet Sinan Yildirim, Shuai Wang, Haizhou Li, Mengling Feng

Viaarxiv icon

Audiobox: Unified Audio Generation with Natural Language Prompts

Dec 25, 2023
Apoorv Vyas, Bowen Shi, Matthew Le, Andros Tjandra, Yi-Chiao Wu, Baishan Guo, Jiemin Zhang, Xinyue Zhang, Robert Adkins, William Ngan, Jeff Wang, Ivan Cruz, Bapi Akula, Akinniyi Akinyemi, Brian Ellis, Rashel Moritz, Yael Yungster, Alice Rakotoarison, Liang Tan, Chris Summers, Carleigh Wood, Joshua Lane, Mary Williamson, Wei-Ning Hsu

Viaarxiv icon

FADI-AEC: Fast Score Based Diffusion Model Guided by Far-end Signal for Acoustic Echo Cancellation

Jan 08, 2024
Yang Liu, Li Wan, Yun Li, Yiteng Huang, Ming Sun, James Luan, Yangyang Shi, Xin Lei

Viaarxiv icon

Vec-Tok Speech: speech vectorization and tokenization for neural speech generation

Add code
Bookmark button
Alert button
Oct 12, 2023
Xinfa Zhu, Yuanjun Lv, Yi Lei, Tao Li, Wendi He, Hongbin Zhou, Heng Lu, Lei Xie

Viaarxiv icon

1SPU: 1-step Speech Processing Unit

Add code
Bookmark button
Alert button
Nov 10, 2023
Karan Singla, Shahab Jalalvand, Yeon-Jun Kim, Antonio Moreno Daniel, Srinivas Bangalore, Andrej Ljolje, Ben Stern

Viaarxiv icon