Alert button

"speech": models, code, and papers
Alert button

UTDUSS: UTokyo-SaruLab System for Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge

Mar 20, 2024
Wataru Nakata, Kazuki Yamauchi, Dong Yang, Hiroaki Hyodo, Yuki Saito

Viaarxiv icon

MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation

Mar 19, 2024
Yifan Peng, Ilia Kulikov, Yilin Yang, Sravya Popuri, Hui Lu, Changhan Wang, Hongyu Gong

Viaarxiv icon

XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception

Mar 21, 2024
HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, Changhan Wang

Viaarxiv icon

An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis

Mar 19, 2024
Yifan Peng, Ilia Kulikov, Yilin Yang, Sravya Popuri, Hui Lu, Changhan Wang, Hongyu Gong

Viaarxiv icon

Driving Animatronic Robot Facial Expression From Speech

Mar 21, 2024
Boren Li, Hang Li, Hangxin Liu

Viaarxiv icon

Advanced Long-Content Speech Recognition With Factorized Neural Transducer

Mar 20, 2024
Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian

Viaarxiv icon

Hallucination in Perceptual Metric-Driven Speech Enhancement Networks

Mar 18, 2024
George Close, Thomas Hain, Stefan Goetze

Viaarxiv icon

Building speech corpus with diverse voice characteristics for its prompt-based representation

Mar 20, 2024
Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, Hiroshi Saruwatari

Viaarxiv icon

Wav2Gloss: Generating Interlinear Glossed Text from Speech

Mar 19, 2024
Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel R. Robinson, Jiatong Shi, Shinji Watanabe, Graham Neubig, David R. Mortensen, Lori Levin

Viaarxiv icon

Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales

Mar 19, 2024
Ayushi Nirmal, Amrita Bhattacharjee, Paras Sheth, Huan Liu

Viaarxiv icon