Alert button

"speech": models, code, and papers
Alert button

Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor

Jan 23, 2024
Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhong-Qiu Wang, Shinji Watanabe

Viaarxiv icon

Stateful FastConformer with Cache-based Inference for Streaming Automatic Speech Recognition

Dec 27, 2023
Vahid Noroozi, Somshubra Majumdar, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg

Viaarxiv icon

Improving Cross-Domain Hate Speech Generalizability with Emotion Knowledge

Add code
Bookmark button
Alert button
Nov 24, 2023
Shi Yin Hong, Susan Gauch

Viaarxiv icon

Diffusion-Based Speech Enhancement in Matched and Mismatched Conditions Using a Heun-Based Sampler

Dec 05, 2023
Philippe Gonzalez, Zheng-Hua Tan, Jan Østergaard, Jesper Jensen, Tommy Sonne Alstrøm, Tobias May

Viaarxiv icon

Hate Speech and Offensive Content Detection in Indo-Aryan Languages: A Battle of LSTM and Transformers

Add code
Bookmark button
Alert button
Dec 09, 2023
Nikhil Narayan, Mrutyunjay Biswal, Pramod Goyal, Abhranta Panigrahi

Viaarxiv icon

3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing

Add code
Bookmark button
Alert button
Dec 01, 2023
Balamurugan Thambiraja, Sadegh Aliakbarian, Darren Cosker, Justus Thies

Viaarxiv icon

BLSTM-Based Confidence Estimation for End-to-End Speech Recognition

Dec 22, 2023
Atsunori Ogawa, Naohiro Tawara, Takatomo Kano, Marc Delcroix

Viaarxiv icon

On real-time multi-stage speech enhancement systems

Dec 19, 2023
Lingjun Meng, Jozef Coldenhoff, Paul Kendrick, Tijana Stojkovic, Andrew Harper, Kiril Ratmanski, Milos Cernak

Viaarxiv icon

Multi-Task Learning for Front-End Text Processing in TTS

Jan 12, 2024
Wonjune Kang, Yun Wang, Shun Zhang, Arthur Hinsvark, Qing He

Viaarxiv icon

Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition

Dec 15, 2023
Tzu-Ting Yang, Hsin-Wei Wang, Berlin Chen

Viaarxiv icon