Alert button

"speech": models, code, and papers
Alert button

AudioSlots: A slot-centric generative model for audio separation

May 09, 2023
Pradyumna Reddy, Scott Wisdom, Klaus Greff, John R. Hershey, Thomas Kipf

Figure 1 for AudioSlots: A slot-centric generative model for audio separation
Figure 2 for AudioSlots: A slot-centric generative model for audio separation
Figure 3 for AudioSlots: A slot-centric generative model for audio separation
Viaarxiv icon

Towards Improved Room Impulse Response Estimation for Speech Recognition

Nov 08, 2022
Anton Ratnarajah, Ishwarya Ananthabhotla, Vamsi Krishna Ithapu, Pablo Hoffmann, Dinesh Manocha, Paul Calamia

Figure 1 for Towards Improved Room Impulse Response Estimation for Speech Recognition
Figure 2 for Towards Improved Room Impulse Response Estimation for Speech Recognition
Figure 3 for Towards Improved Room Impulse Response Estimation for Speech Recognition
Figure 4 for Towards Improved Room Impulse Response Estimation for Speech Recognition
Viaarxiv icon

Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search

Nov 16, 2022
Zihan Wang, Qi Meng, HaiFeng Lan, XinRui Zhang, KeHao Guo, Akshat Gupta

Figure 1 for Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search
Figure 2 for Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search
Figure 3 for Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search
Figure 4 for Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search
Viaarxiv icon

Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss

Apr 12, 2023
Zhiyuan Zhao, Lijun Wu, Chuanxin Tang, Dacheng Yin, Yucheng Zhao, Chong Luo

Figure 1 for Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss
Figure 2 for Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss
Figure 3 for Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss
Figure 4 for Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss
Viaarxiv icon

The Ability of Self-Supervised Speech Models for Audio Representations

Sep 28, 2022
Tung-Yu Wu, Chen-An Li, Tzu-Han Lin, Tsu-Yuan Hsu, Hung-Yi Lee

Figure 1 for The Ability of Self-Supervised Speech Models for Audio Representations
Figure 2 for The Ability of Self-Supervised Speech Models for Audio Representations
Figure 3 for The Ability of Self-Supervised Speech Models for Audio Representations
Figure 4 for The Ability of Self-Supervised Speech Models for Audio Representations
Viaarxiv icon

ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition

Oct 24, 2022
Sanchit Gandhi, Patrick von Platen, Alexander M. Rush

Figure 1 for ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
Figure 2 for ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
Figure 3 for ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
Figure 4 for ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
Viaarxiv icon

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

Sep 30, 2022
Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu, Shuo Ren, Shujie Liu, Zhuoyuan Yao, Xun Gong, Lirong Dai, Jinyu Li, Furu Wei

Figure 1 for SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Figure 2 for SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Figure 3 for SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Figure 4 for SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Viaarxiv icon

Fast Yet Effective Speech Emotion Recognition with Self-distillation

Oct 26, 2022
Zhao Ren, Thanh Tam Nguyen, Yi Chang, Björn W. Schuller

Figure 1 for Fast Yet Effective Speech Emotion Recognition with Self-distillation
Figure 2 for Fast Yet Effective Speech Emotion Recognition with Self-distillation
Figure 3 for Fast Yet Effective Speech Emotion Recognition with Self-distillation
Figure 4 for Fast Yet Effective Speech Emotion Recognition with Self-distillation
Viaarxiv icon

Rate-Adaptive Coding Mechanism for Semantic Communications With Multi-Modal Data

May 18, 2023
Yangshuo He, Guanding Yu, Yunlong Cai

Figure 1 for Rate-Adaptive Coding Mechanism for Semantic Communications With Multi-Modal Data
Figure 2 for Rate-Adaptive Coding Mechanism for Semantic Communications With Multi-Modal Data
Figure 3 for Rate-Adaptive Coding Mechanism for Semantic Communications With Multi-Modal Data
Figure 4 for Rate-Adaptive Coding Mechanism for Semantic Communications With Multi-Modal Data
Viaarxiv icon

Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder

Dec 16, 2022
Yusuke Yasuda, Tomoki Toda

Figure 1 for Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder
Figure 2 for Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder
Figure 3 for Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder
Viaarxiv icon