Alert button

"speech": models, code, and papers
Alert button

Open-vocabulary keyword spotting in any language through multilingual contrastive speech-phoneme pretraining

Nov 14, 2023
Jian Zhu, Farhan Samir, Changbing Yang, Jahurul Islam

Figure 1 for Open-vocabulary keyword spotting in any language through multilingual contrastive speech-phoneme pretraining
Figure 2 for Open-vocabulary keyword spotting in any language through multilingual contrastive speech-phoneme pretraining
Figure 3 for Open-vocabulary keyword spotting in any language through multilingual contrastive speech-phoneme pretraining
Figure 4 for Open-vocabulary keyword spotting in any language through multilingual contrastive speech-phoneme pretraining
Viaarxiv icon

Towards Online Sign Language Recognition and Translation

Add code
Bookmark button
Alert button
Jan 10, 2024
Ronglai Zuo, Fangyun Wei, Brian Mak

Viaarxiv icon

Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues

Add code
Bookmark button
Alert button
Jan 05, 2024
David Gimeno-Gómez, Ana-Maria Bucur, Adrian Cosma, Carlos-David Martínez-Hinarejos, Paolo Rosso

Viaarxiv icon

Low-latency Speech Enhancement via Speech Token Generation

Add code
Bookmark button
Alert button
Oct 13, 2023
Huaying Xue, Xiulian Peng, Yan Lu

Figure 1 for Low-latency Speech Enhancement via Speech Token Generation
Figure 2 for Low-latency Speech Enhancement via Speech Token Generation
Figure 3 for Low-latency Speech Enhancement via Speech Token Generation
Figure 4 for Low-latency Speech Enhancement via Speech Token Generation
Viaarxiv icon

On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition

Nov 14, 2023
Xiaohan Shi, Jiajun He, Xingfeng Li, Tomoki Toda

Viaarxiv icon

SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems

Jan 08, 2024
Dong Zhang, Zhaowei Li, Pengyu Wang, Xin Zhang, Yaqian Zhou, Xipeng Qiu

Viaarxiv icon

On the compression of shallow non-causal ASR models using knowledge distillation and tied-and-reduced decoder for low-latency on-device speech recognition

Dec 15, 2023
Nagaraj Adiga, Jinhwan Park, Chintigari Shiva Kumar, Shatrughan Singh, Kyungmin Lee, Chanwoo Kim, Dhananjaya Gowda

Viaarxiv icon

Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data

Nov 12, 2023
Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

Viaarxiv icon

CDSD: Chinese Dysarthria Speech Database

Oct 24, 2023
Mengyi Sun, Ming Gao, Xinchen Kang, Shiru Wang, Jun Du, Dengfeng Yao, Su-Jing Wang

Viaarxiv icon

Generative De-Quantization for Neural Speech Codec via Latent Diffusion

Nov 15, 2023
Haici Yang, Inseon Jang, Minje Kim

Viaarxiv icon