Alert button

"speech": models, code, and papers
Alert button

Transgressing the boundaries: towards a rigorous understanding of deep learning and its (non-)robustness

Jul 05, 2023
Carsten Hartmann, Lorenz Richter

Figure 1 for Transgressing the boundaries: towards a rigorous understanding of deep learning and its (non-)robustness
Figure 2 for Transgressing the boundaries: towards a rigorous understanding of deep learning and its (non-)robustness
Figure 3 for Transgressing the boundaries: towards a rigorous understanding of deep learning and its (non-)robustness
Figure 4 for Transgressing the boundaries: towards a rigorous understanding of deep learning and its (non-)robustness
Viaarxiv icon

RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness

Feb 23, 2023
Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk

Figure 1 for RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness
Figure 2 for RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness
Figure 3 for RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness
Figure 4 for RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness
Viaarxiv icon

SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities

May 19, 2023
Dong Zhang, Shimin Li, Xin Zhang, Jun Zhan, Pengyu Wang, Yaqian Zhou, Xipeng Qiu

Figure 1 for SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
Figure 2 for SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
Figure 3 for SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
Figure 4 for SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
Viaarxiv icon

Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems

Jun 03, 2023
Manuele Rusci, Tinne Tuytelaars

Figure 1 for Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems
Figure 2 for Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems
Figure 3 for Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems
Viaarxiv icon

Warning: Humans Cannot Reliably Detect Speech Deepfakes

Jan 19, 2023
Kimberly T. Mai, Sergi D. Bray, Toby Davies, Lewis D. Griffin

Figure 1 for Warning: Humans Cannot Reliably Detect Speech Deepfakes
Figure 2 for Warning: Humans Cannot Reliably Detect Speech Deepfakes
Figure 3 for Warning: Humans Cannot Reliably Detect Speech Deepfakes
Figure 4 for Warning: Humans Cannot Reliably Detect Speech Deepfakes
Viaarxiv icon

Leveraging World Knowledge in Implicit Hate Speech Detection

Dec 28, 2022
Jessica Lin

Figure 1 for Leveraging World Knowledge in Implicit Hate Speech Detection
Figure 2 for Leveraging World Knowledge in Implicit Hate Speech Detection
Figure 3 for Leveraging World Knowledge in Implicit Hate Speech Detection
Figure 4 for Leveraging World Knowledge in Implicit Hate Speech Detection
Viaarxiv icon

Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model

Jun 22, 2023
Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann

Figure 1 for Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model
Figure 2 for Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model
Figure 3 for Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model
Figure 4 for Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model
Viaarxiv icon

Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition

Mar 20, 2023
Xiaoyu Yang, Qiujia Li, Chao Zhang, Philip C. Woodland

Figure 1 for Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition
Figure 2 for Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition
Figure 3 for Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition
Figure 4 for Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition
Viaarxiv icon

Time-Domain Speech Enhancement for Robust Automatic Speech Recognition

Oct 27, 2022
Yufeng Yang, Ashutosh Pandey, DeLiang Wang

Figure 1 for Time-Domain Speech Enhancement for Robust Automatic Speech Recognition
Figure 2 for Time-Domain Speech Enhancement for Robust Automatic Speech Recognition
Figure 3 for Time-Domain Speech Enhancement for Robust Automatic Speech Recognition
Figure 4 for Time-Domain Speech Enhancement for Robust Automatic Speech Recognition
Viaarxiv icon

Dialog act guided contextual adapter for personalized speech recognition

Mar 31, 2023
Feng-Ju Chang, Thejaswi Muniyappa, Kanthashree Mysore Sathyendra, Kai Wei, Grant P. Strimel, Ross McGowan

Figure 1 for Dialog act guided contextual adapter for personalized speech recognition
Figure 2 for Dialog act guided contextual adapter for personalized speech recognition
Figure 3 for Dialog act guided contextual adapter for personalized speech recognition
Figure 4 for Dialog act guided contextual adapter for personalized speech recognition
Viaarxiv icon