Alert button

"speech": models, code, and papers
Alert button

Modeling Speaker-Listener Interaction for Backchannel Prediction

Apr 10, 2023
Daniel Ortega, Sarina Meyer, Antje Schweitzer, Ngoc Thang Vu

Figure 1 for Modeling Speaker-Listener Interaction for Backchannel Prediction
Figure 2 for Modeling Speaker-Listener Interaction for Backchannel Prediction
Figure 3 for Modeling Speaker-Listener Interaction for Backchannel Prediction
Figure 4 for Modeling Speaker-Listener Interaction for Backchannel Prediction
Viaarxiv icon

Speaker and Language Change Detection using Wav2vec2 and Whisper

Feb 18, 2023
Tijn Berns, Nik Vaessen, David A. van Leeuwen

Figure 1 for Speaker and Language Change Detection using Wav2vec2 and Whisper
Figure 2 for Speaker and Language Change Detection using Wav2vec2 and Whisper
Figure 3 for Speaker and Language Change Detection using Wav2vec2 and Whisper
Figure 4 for Speaker and Language Change Detection using Wav2vec2 and Whisper
Viaarxiv icon

Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis

Add code
Bookmark button
Alert button
Mar 31, 2022
Karren Yang, Dejan Markovic, Steven Krenn, Vasu Agrawal, Alexander Richard

Figure 1 for Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Figure 2 for Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Figure 3 for Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Figure 4 for Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Viaarxiv icon

An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition

Oct 13, 2022
Chao-Han Huck Yang, I-Fan Chen, Andreas Stolcke, Sabato Marco Siniscalchi, Chin-Hui Lee

Figure 1 for An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition
Figure 2 for An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition
Figure 3 for An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition
Figure 4 for An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition
Viaarxiv icon

Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition

Nov 28, 2022
Ji Won Yoon, Beom Jun Woo, Sunghwan Ahn, Hyeonseung Lee, Nam Soo Kim

Figure 1 for Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition
Figure 2 for Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition
Figure 3 for Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition
Figure 4 for Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition
Viaarxiv icon

An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech Recognition

Oct 12, 2022
Chao-Han Huck Yang, Jun Qi, Sabato Marco Siniscalchi, Chin-Hui Lee

Figure 1 for An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech Recognition
Figure 2 for An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech Recognition
Figure 3 for An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech Recognition
Figure 4 for An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech Recognition
Viaarxiv icon

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

Add code
Bookmark button
Alert button
May 01, 2023
Zhenhui Ye, Jinzheng He, Ziyue Jiang, Rongjie Huang, Jiawei Huang, Jinglin Liu, Yi Ren, Xiang Yin, Zejun Ma, Zhou Zhao

Figure 1 for GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
Figure 2 for GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
Figure 3 for GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
Figure 4 for GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
Viaarxiv icon

Non-Asymptotic Pointwise and Worst-Case Bounds for Classical Spectrum Estimators

Mar 21, 2023
Andrew Lamperski

Viaarxiv icon

Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions

Jun 30, 2022
Urja Khurana, Ivar Vermeulen, Eric Nalisnick, Marloes van Noorloos, Antske Fokkens

Figure 1 for Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions
Figure 2 for Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions
Figure 3 for Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions
Viaarxiv icon

HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch

Oct 18, 2022
Tina Raissi, Wei Zhou, Simon Berger, Ralf Schlüter, Hermann Ney

Figure 1 for HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch
Figure 2 for HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch
Figure 3 for HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch
Figure 4 for HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch
Viaarxiv icon