Alert button

"speech": models, code, and papers
Alert button

The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge

Mar 11, 2023
Pengcheng Guo, He Wang, Bingshen Mu, Ao Zhang, Peikun Chen

Figure 1 for The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge
Figure 2 for The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge
Figure 3 for The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge
Viaarxiv icon

Clinical BERTScore: An Improved Measure of Automatic Speech Recognition Performance in Clinical Settings

Mar 13, 2023
Joel Shor, Ruyue Agnes Bi, Subhashini Venugopalan, Steven Ibara, Roman Goldenberg, Ehud Rivlin

Figure 1 for Clinical BERTScore: An Improved Measure of Automatic Speech Recognition Performance in Clinical Settings
Figure 2 for Clinical BERTScore: An Improved Measure of Automatic Speech Recognition Performance in Clinical Settings
Figure 3 for Clinical BERTScore: An Improved Measure of Automatic Speech Recognition Performance in Clinical Settings
Viaarxiv icon

Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition

Feb 02, 2023
HoLam Chung, Junan Li, Pengfei Liu1, Wai-Kim Leung, Xixin Wu, Helen Meng

Figure 1 for Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition
Figure 2 for Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition
Figure 3 for Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition
Figure 4 for Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition
Viaarxiv icon

Efficient Incremental Text-to-Speech on GPUs

Dec 05, 2022
Muyang Du, Chuan Liu, Jiaxing Qi, Junjie Lai

Figure 1 for Efficient Incremental Text-to-Speech on GPUs
Figure 2 for Efficient Incremental Text-to-Speech on GPUs
Figure 3 for Efficient Incremental Text-to-Speech on GPUs
Figure 4 for Efficient Incremental Text-to-Speech on GPUs
Viaarxiv icon

Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR

Oct 18, 2022
Zhehuai Chen, Ankur Bapna, Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Pedro Moreno, Nanxin Chen

Figure 1 for Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Figure 2 for Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Figure 3 for Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Figure 4 for Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Viaarxiv icon

PSST! Prosodic Speech Segmentation with Transformers

Feb 03, 2023
Nathan Roll, Calbert Graham, Simon Todd

Figure 1 for PSST! Prosodic Speech Segmentation with Transformers
Figure 2 for PSST! Prosodic Speech Segmentation with Transformers
Figure 3 for PSST! Prosodic Speech Segmentation with Transformers
Figure 4 for PSST! Prosodic Speech Segmentation with Transformers
Viaarxiv icon

Simulating realistic speech overlaps improves multi-talker ASR

Nov 17, 2022
Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka

Figure 1 for Simulating realistic speech overlaps improves multi-talker ASR
Figure 2 for Simulating realistic speech overlaps improves multi-talker ASR
Figure 3 for Simulating realistic speech overlaps improves multi-talker ASR
Figure 4 for Simulating realistic speech overlaps improves multi-talker ASR
Viaarxiv icon

D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement

Feb 23, 2023
Shengkui Zhao, Bin Ma

Figure 1 for D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement
Figure 2 for D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement
Figure 3 for D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement
Figure 4 for D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement
Viaarxiv icon

Improving Language Model Integration for Neural Machine Translation

Jun 08, 2023
Christian Herold, Yingbo Gao, Mohammad Zeineldeen, Hermann Ney

Figure 1 for Improving Language Model Integration for Neural Machine Translation
Figure 2 for Improving Language Model Integration for Neural Machine Translation
Figure 3 for Improving Language Model Integration for Neural Machine Translation
Figure 4 for Improving Language Model Integration for Neural Machine Translation
Viaarxiv icon

Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming

Jun 05, 2023
Xinlei Niu, Christian Walder, Jing Zhang, Charles Patrick Martin

Figure 1 for Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming
Figure 2 for Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming
Figure 3 for Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming
Figure 4 for Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming
Viaarxiv icon