Alert button

"speech": models, code, and papers
Alert button

Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?

Feb 01, 2024
Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Skyler Seto, Tatiana Likhomanenko, Stephen Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe, Barry-John Theobald

Viaarxiv icon

A cross-talk robust multichannel VAD model for multiparty agent interactions trained using synthetic re-recordings

Feb 15, 2024
Hyewon Han, Naveen Kumar

Viaarxiv icon

REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR

Feb 06, 2024
Liang-Hsuan Tseng, En-Pei Hu, Cheng-Han Chiang, Yuan Tseng, Hung-yi Lee, Lin-shan Lee, Shao-Hua Sun

Viaarxiv icon

SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition

Jan 18, 2024
Hao Wang, Shuhei Kurita, Shuichiro Shimizu, Daisuke Kawahara

Viaarxiv icon

GLA-Grad: A Griffin-Lim Extended Waveform Generation Diffusion Model

Add code
Bookmark button
Alert button
Feb 09, 2024
Haocheng Liu, Teysir Baoueb, Mathieu Fontaine, Jonathan Le Roux, Gael Richard

Viaarxiv icon

Exploring the limits of decoder-only models trained on public speech recognition corpora

Add code
Bookmark button
Alert button
Jan 31, 2024
Ankit Gupta, George Saon, Brian Kingsbury

Viaarxiv icon

Exploratory Evaluation of Speech Content Masking

Jan 08, 2024
Jennifer Williams, Karla Pizzi, Paul-Gauthier Noe, Sneha Das

Viaarxiv icon

AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition

Jan 18, 2024
Ju Lin, Niko Moritz, Yiteng Huang, Ruiming Xie, Ming Sun, Christian Fuegen, Frank Seide

Viaarxiv icon

SALAD: Smart AI Language Assistant Daily

Feb 13, 2024
Ragib Amin Nihal, Tran Dong Huu Quoc, Lin Zirui, Xu Yimimg, Liu Haoran, An Zhaoyi, Kyou Ma

Viaarxiv icon

A New Approach to Voice Authenticity

Feb 09, 2024
Nicolas M. Müller, Piotr Kawa, Shen Hu, Matthias Neu, Jennifer Williams, Philip Sperl, Konstantin Böttinger

Viaarxiv icon