Alert button

"speech": models, code, and papers
Alert button

HuBERT-TR: Reviving Turkish Automatic Speech Recognition with Self-supervised Speech Representation Learning

Oct 13, 2022
Ali Safaya, Engin Erzin

Figure 1 for HuBERT-TR: Reviving Turkish Automatic Speech Recognition with Self-supervised Speech Representation Learning
Figure 2 for HuBERT-TR: Reviving Turkish Automatic Speech Recognition with Self-supervised Speech Representation Learning
Figure 3 for HuBERT-TR: Reviving Turkish Automatic Speech Recognition with Self-supervised Speech Representation Learning
Figure 4 for HuBERT-TR: Reviving Turkish Automatic Speech Recognition with Self-supervised Speech Representation Learning
Viaarxiv icon

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

Oct 03, 2022
Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-yi Lee, David Harwath

Figure 1 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Figure 2 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Figure 3 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Figure 4 for SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Viaarxiv icon

End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders

May 04, 2023
Jixuan Wang, Martin Radfar, Kai Wei, Clement Chung

Figure 1 for End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders
Figure 2 for End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders
Figure 3 for End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders
Figure 4 for End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders
Viaarxiv icon

AudioSlots: A slot-centric generative model for audio separation

May 09, 2023
Pradyumna Reddy, Scott Wisdom, Klaus Greff, John R. Hershey, Thomas Kipf

Figure 1 for AudioSlots: A slot-centric generative model for audio separation
Figure 2 for AudioSlots: A slot-centric generative model for audio separation
Figure 3 for AudioSlots: A slot-centric generative model for audio separation
Viaarxiv icon

A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge

May 02, 2023
Siddhant Arora, Hayato Futami, Shih-Lun Wu, Jessica Huynh, Yifan Peng, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe

Figure 1 for A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge
Figure 2 for A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge
Figure 3 for A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge
Viaarxiv icon

An ASR-free Fluency Scoring Approach with Self-Supervised Learning

Mar 13, 2023
Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li, Zejun Ma, Tan Lee

Figure 1 for An ASR-free Fluency Scoring Approach with Self-Supervised Learning
Figure 2 for An ASR-free Fluency Scoring Approach with Self-Supervised Learning
Figure 3 for An ASR-free Fluency Scoring Approach with Self-Supervised Learning
Figure 4 for An ASR-free Fluency Scoring Approach with Self-Supervised Learning
Viaarxiv icon

Fast and efficient speech enhancement with variational autoencoders

Nov 02, 2022
Mostafa Sadeghi, Romain Serizel

Figure 1 for Fast and efficient speech enhancement with variational autoencoders
Viaarxiv icon

Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss

Apr 12, 2023
Zhiyuan Zhao, Lijun Wu, Chuanxin Tang, Dacheng Yin, Yucheng Zhao, Chong Luo

Figure 1 for Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss
Figure 2 for Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss
Figure 3 for Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss
Figure 4 for Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss
Viaarxiv icon

Improving Speech Enhancement through Fine-Grained Speech Characteristics

Jul 11, 2022
Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

Figure 1 for Improving Speech Enhancement through Fine-Grained Speech Characteristics
Figure 2 for Improving Speech Enhancement through Fine-Grained Speech Characteristics
Figure 3 for Improving Speech Enhancement through Fine-Grained Speech Characteristics
Figure 4 for Improving Speech Enhancement through Fine-Grained Speech Characteristics
Viaarxiv icon

Qualitative Analysis of a Graph Transformer Approach to Addressing Hate Speech: Adapting to Dynamically Changing Content

Jan 25, 2023
Liam Hebert, Hong Yi Chen, Robin Cohen, Lukasz Golab

Figure 1 for Qualitative Analysis of a Graph Transformer Approach to Addressing Hate Speech: Adapting to Dynamically Changing Content
Figure 2 for Qualitative Analysis of a Graph Transformer Approach to Addressing Hate Speech: Adapting to Dynamically Changing Content
Figure 3 for Qualitative Analysis of a Graph Transformer Approach to Addressing Hate Speech: Adapting to Dynamically Changing Content
Figure 4 for Qualitative Analysis of a Graph Transformer Approach to Addressing Hate Speech: Adapting to Dynamically Changing Content
Viaarxiv icon