Alert button

"speech": models, code, and papers
Alert button

AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR

Mar 29, 2023
Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid

Figure 1 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Figure 2 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Figure 3 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Figure 4 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Viaarxiv icon

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

Add code
Bookmark button
Alert button
May 21, 2023
Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogerio Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James Glass

Figure 1 for Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
Figure 2 for Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
Figure 3 for Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
Viaarxiv icon

Cross-lingual Prosody Transfer for Expressive Machine Dubbing

Jun 20, 2023
Jakub Swiatkowski, Duo Wang, Mikolaj Babianski, Patrick Lumban Tobing, Ravichander Vipperla, Vincent Pollet

Figure 1 for Cross-lingual Prosody Transfer for Expressive Machine Dubbing
Figure 2 for Cross-lingual Prosody Transfer for Expressive Machine Dubbing
Figure 3 for Cross-lingual Prosody Transfer for Expressive Machine Dubbing
Figure 4 for Cross-lingual Prosody Transfer for Expressive Machine Dubbing
Viaarxiv icon

FPGA Resource-aware Structured Pruning for Real-Time Neural Networks

Add code
Bookmark button
Alert button
Aug 09, 2023
Benjamin Ramhorst, George A. Constantinides, Vladimir Loncar

Figure 1 for FPGA Resource-aware Structured Pruning for Real-Time Neural Networks
Figure 2 for FPGA Resource-aware Structured Pruning for Real-Time Neural Networks
Figure 3 for FPGA Resource-aware Structured Pruning for Real-Time Neural Networks
Figure 4 for FPGA Resource-aware Structured Pruning for Real-Time Neural Networks
Viaarxiv icon

Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection

Add code
Bookmark button
Alert button
Aug 07, 2023
Xiaohui Zhang, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Chuyuan Zhang

Figure 1 for Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection
Figure 2 for Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection
Figure 3 for Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection
Figure 4 for Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection
Viaarxiv icon

Expressive Machine Dubbing Through Phrase-level Cross-lingual Prosody Transfer

Jun 21, 2023
Jakub Swiatkowski, Duo Wang, Mikolaj Babianski, Giuseppe Coccia, Patrick Lumban Tobing, Ravichander Vipperla, Viacheslav Klimkov, Vincent Pollet

Figure 1 for Expressive Machine Dubbing Through Phrase-level Cross-lingual Prosody Transfer
Figure 2 for Expressive Machine Dubbing Through Phrase-level Cross-lingual Prosody Transfer
Figure 3 for Expressive Machine Dubbing Through Phrase-level Cross-lingual Prosody Transfer
Figure 4 for Expressive Machine Dubbing Through Phrase-level Cross-lingual Prosody Transfer
Viaarxiv icon

Enhancing multilingual speech recognition in air traffic control by sentence-level language identification

Apr 29, 2023
Peng Fan, Dongyue Guo, JianWei Zhang, Bo Yang, Yi Lin

Figure 1 for Enhancing multilingual speech recognition in air traffic control by sentence-level language identification
Figure 2 for Enhancing multilingual speech recognition in air traffic control by sentence-level language identification
Figure 3 for Enhancing multilingual speech recognition in air traffic control by sentence-level language identification
Figure 4 for Enhancing multilingual speech recognition in air traffic control by sentence-level language identification
Viaarxiv icon

Jointly Optimizing Translations and Speech Timing to Improve Isochrony in Automatic Dubbing

Add code
Bookmark button
Alert button
Feb 25, 2023
Alexandra Chronopoulou, Brian Thompson, Prashant Mathur, Yogesh Virkar, Surafel M. Lakew, Marcello Federico

Figure 1 for Jointly Optimizing Translations and Speech Timing to Improve Isochrony in Automatic Dubbing
Figure 2 for Jointly Optimizing Translations and Speech Timing to Improve Isochrony in Automatic Dubbing
Figure 3 for Jointly Optimizing Translations and Speech Timing to Improve Isochrony in Automatic Dubbing
Figure 4 for Jointly Optimizing Translations and Speech Timing to Improve Isochrony in Automatic Dubbing
Viaarxiv icon

Learning Robust Self-attention Features for Speech Emotion Recognition with Label-adaptive Mixup

Add code
Bookmark button
Alert button
May 07, 2023
Lei Kang, Lichao Zhang, Dazhi Jiang

Figure 1 for Learning Robust Self-attention Features for Speech Emotion Recognition with Label-adaptive Mixup
Figure 2 for Learning Robust Self-attention Features for Speech Emotion Recognition with Label-adaptive Mixup
Figure 3 for Learning Robust Self-attention Features for Speech Emotion Recognition with Label-adaptive Mixup
Figure 4 for Learning Robust Self-attention Features for Speech Emotion Recognition with Label-adaptive Mixup
Viaarxiv icon

STHG: Spatial-Temporal Heterogeneous Graph Learning for Advanced Audio-Visual Diarization

Jun 18, 2023
Kyle Min

Figure 1 for STHG: Spatial-Temporal Heterogeneous Graph Learning for Advanced Audio-Visual Diarization
Figure 2 for STHG: Spatial-Temporal Heterogeneous Graph Learning for Advanced Audio-Visual Diarization
Figure 3 for STHG: Spatial-Temporal Heterogeneous Graph Learning for Advanced Audio-Visual Diarization
Figure 4 for STHG: Spatial-Temporal Heterogeneous Graph Learning for Advanced Audio-Visual Diarization
Viaarxiv icon