Alert button

"speech": models, code, and papers
Alert button

QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation

Add code
Bookmark button
Alert button
May 18, 2023
Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Haolin Zhuang

Figure 1 for QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation
Figure 2 for QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation
Figure 3 for QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation
Figure 4 for QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation
Viaarxiv icon

Automatic Annotation of Direct Speech in Written French Narratives

Add code
Bookmark button
Alert button
Jun 28, 2023
Noé Durandard, Viet-Anh Tran, Gaspard Michel, Elena V. Epure

Figure 1 for Automatic Annotation of Direct Speech in Written French Narratives
Figure 2 for Automatic Annotation of Direct Speech in Written French Narratives
Figure 3 for Automatic Annotation of Direct Speech in Written French Narratives
Figure 4 for Automatic Annotation of Direct Speech in Written French Narratives
Viaarxiv icon

Scaling Laws for Discriminative Speech Recognition Rescoring Models

Jun 27, 2023
Yile Gu, Prashanth Gurunath Shivakumar, Jari Kolehmainen, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko

Figure 1 for Scaling Laws for Discriminative Speech Recognition Rescoring Models
Figure 2 for Scaling Laws for Discriminative Speech Recognition Rescoring Models
Figure 3 for Scaling Laws for Discriminative Speech Recognition Rescoring Models
Figure 4 for Scaling Laws for Discriminative Speech Recognition Rescoring Models
Viaarxiv icon

Quantifying the perceptual value of lexical and non-lexical channels in speech

Add code
Bookmark button
Alert button
Jul 07, 2023
Sarenne Wallbridge, Peter Bell, Catherine Lai

Figure 1 for Quantifying the perceptual value of lexical and non-lexical channels in speech
Figure 2 for Quantifying the perceptual value of lexical and non-lexical channels in speech
Figure 3 for Quantifying the perceptual value of lexical and non-lexical channels in speech
Viaarxiv icon

Probing self-supervised speech models for phonetic and phonemic information: a case study in aspiration

Jun 09, 2023
Kinan Martin, Jon Gauthier, Canaan Breiss, Roger Levy

Figure 1 for Probing self-supervised speech models for phonetic and phonemic information: a case study in aspiration
Figure 2 for Probing self-supervised speech models for phonetic and phonemic information: a case study in aspiration
Figure 3 for Probing self-supervised speech models for phonetic and phonemic information: a case study in aspiration
Figure 4 for Probing self-supervised speech models for phonetic and phonemic information: a case study in aspiration
Viaarxiv icon

In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised Representations and Neural Vocoder-based Resynthesis

Jun 02, 2023
Navin Raj Prabhu, Nale Lehmann-Willenbrock, Timo Gerkmann

Figure 1 for In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised Representations and Neural Vocoder-based Resynthesis
Figure 2 for In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised Representations and Neural Vocoder-based Resynthesis
Figure 3 for In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised Representations and Neural Vocoder-based Resynthesis
Figure 4 for In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised Representations and Neural Vocoder-based Resynthesis
Viaarxiv icon

ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment

Aug 28, 2023
Yicheng Zhong, Huawei Wei, Peiji Yang, Zhisheng Wang

Figure 1 for ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment
Figure 2 for ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment
Figure 3 for ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment
Figure 4 for ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment
Viaarxiv icon

Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data

May 25, 2023
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami

Figure 1 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Figure 2 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Figure 3 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Figure 4 for Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
Viaarxiv icon

A Study on the Reliability of Automatic Dysarthric Speech Assessments

Add code
Bookmark button
Alert button
Jun 07, 2023
Xavier F. Cadet, Ranya Aloufi, Sara Ahmadi-Abhari, Hamed Haddadi

Figure 1 for A Study on the Reliability of Automatic Dysarthric Speech Assessments
Figure 2 for A Study on the Reliability of Automatic Dysarthric Speech Assessments
Figure 3 for A Study on the Reliability of Automatic Dysarthric Speech Assessments
Figure 4 for A Study on the Reliability of Automatic Dysarthric Speech Assessments
Viaarxiv icon

On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition

May 21, 2023
Lokesh Bansal, S. Pavankumar Dubagunta, Malolan Chetlur, Pushpak Jagtap, Aravind Ganapathiraju

Figure 1 for On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition
Figure 2 for On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition
Figure 3 for On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition
Viaarxiv icon