Alert button

"speech": models, code, and papers
Alert button

Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet

Oct 05, 2021
Max Morrison, Zeyu Jin, Nicholas J. Bryan, Juan-Pablo Caceres, Bryan Pardo

Figure 1 for Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet
Figure 2 for Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet
Figure 3 for Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet
Figure 4 for Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet
Viaarxiv icon

Towards Realistic Visual Dubbing with Heterogeneous Sources

Jan 17, 2022
Tianyi Xie, Liucheng Liao, Cheng Bi, Benlai Tang, Xiang Yin, Jianfei Yang, Mingjie Wang, Jiali Yao, Yang Zhang, Zejun Ma

Figure 1 for Towards Realistic Visual Dubbing with Heterogeneous Sources
Figure 2 for Towards Realistic Visual Dubbing with Heterogeneous Sources
Figure 3 for Towards Realistic Visual Dubbing with Heterogeneous Sources
Figure 4 for Towards Realistic Visual Dubbing with Heterogeneous Sources
Viaarxiv icon

VScript: Controllable Script Generation with Audio-Visual Presentation

Mar 01, 2022
Ziwei Ji, Yan Xu, I-Tsun Cheng, Samuel Cahyawijaya, Rita Frieske, Etsuko Ishii, Min Zeng, Andrea Madotto, Pascale Fung

Figure 1 for VScript: Controllable Script Generation with Audio-Visual Presentation
Figure 2 for VScript: Controllable Script Generation with Audio-Visual Presentation
Figure 3 for VScript: Controllable Script Generation with Audio-Visual Presentation
Figure 4 for VScript: Controllable Script Generation with Audio-Visual Presentation
Viaarxiv icon

Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech

Aug 03, 2020
Monica Sunkara, Srikanth Ronanki, Dhanush Bekal, Sravan Bodapati, Katrin Kirchhoff

Figure 1 for Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech
Figure 2 for Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech
Figure 3 for Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech
Figure 4 for Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech
Viaarxiv icon

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

Jun 03, 2021
Yichong Leng, Xu Tan, Linchen Zhu, Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin, Tie-Yan Liu

Figure 1 for FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Figure 2 for FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Figure 3 for FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Figure 4 for FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Viaarxiv icon

The slurk Interaction Server Framework: Better Data for Better Dialog Models

Feb 02, 2022
Jana Götze, Maike Paetzel-Prüsmann, Wencke Liermann, Tim Diekmann, David Schlangen

Figure 1 for The slurk Interaction Server Framework: Better Data for Better Dialog Models
Figure 2 for The slurk Interaction Server Framework: Better Data for Better Dialog Models
Figure 3 for The slurk Interaction Server Framework: Better Data for Better Dialog Models
Figure 4 for The slurk Interaction Server Framework: Better Data for Better Dialog Models
Viaarxiv icon

A Comparison of Discrete Latent Variable Models for Speech Representation Learning

Oct 24, 2020
Henry Zhou, Alexei Baevski, Michael Auli

Figure 1 for A Comparison of Discrete Latent Variable Models for Speech Representation Learning
Figure 2 for A Comparison of Discrete Latent Variable Models for Speech Representation Learning
Figure 3 for A Comparison of Discrete Latent Variable Models for Speech Representation Learning
Figure 4 for A Comparison of Discrete Latent Variable Models for Speech Representation Learning
Viaarxiv icon

Breaking the Data Barrier: Towards Robust Speech Translation via Adversarial Stability Training

Sep 25, 2019
Qiao Cheng, Meiyuan Fang, Yaqian Han, Jin Huang, Yitao Duan

Figure 1 for Breaking the Data Barrier: Towards Robust Speech Translation via Adversarial Stability Training
Figure 2 for Breaking the Data Barrier: Towards Robust Speech Translation via Adversarial Stability Training
Figure 3 for Breaking the Data Barrier: Towards Robust Speech Translation via Adversarial Stability Training
Figure 4 for Breaking the Data Barrier: Towards Robust Speech Translation via Adversarial Stability Training
Viaarxiv icon

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR

Oct 07, 2021
Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Figure 1 for Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR
Figure 2 for Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR
Figure 3 for Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR
Viaarxiv icon

Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation

Jul 04, 2021
Ryo Masumura, Daiki Okamura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi

Figure 1 for Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation
Figure 2 for Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation
Viaarxiv icon