Alert button

"speech": models, code, and papers
Alert button

Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models

Dec 19, 2022
Yong Cheng, Yu Zhang, Melvin Johnson, Wolfgang Macherey, Ankur Bapna

Figure 1 for Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models
Figure 2 for Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models
Figure 3 for Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models
Figure 4 for Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models
Viaarxiv icon

Adapting an Unadaptable ASR System

Jun 01, 2023
Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill

Figure 1 for Adapting an Unadaptable ASR System
Figure 2 for Adapting an Unadaptable ASR System
Figure 3 for Adapting an Unadaptable ASR System
Figure 4 for Adapting an Unadaptable ASR System
Viaarxiv icon

Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder

Add code
Bookmark button
Alert button
Mar 26, 2023
Hao Shi, Masato Mimura, Longbiao Wang, Jianwu Dang, Tatsuya Kawahara

Figure 1 for Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder
Figure 2 for Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder
Figure 3 for Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder
Figure 4 for Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder
Viaarxiv icon

Retraining-free Customized ASR for Enharmonic Words Based on a Named-Entity-Aware Model and Phoneme Similarity Estimation

May 29, 2023
Yui Sudo, Kazuya Hata, Kazuhiro Nakadai

Viaarxiv icon

APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra

Add code
Bookmark button
Alert button
May 13, 2023
Yang Ai, Zhen-Hua Ling

Figure 1 for APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra
Figure 2 for APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra
Figure 3 for APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra
Figure 4 for APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra
Viaarxiv icon

PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers

Mar 30, 2023
Rahul Pandey, Roger Ren, Qi Luo, Jing Liu, Ariya Rastrow, Ankur Gandhe, Denis Filimonov, Grant Strimel, Andreas Stolcke, Ivan Bulyko

Figure 1 for PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers
Figure 2 for PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers
Figure 3 for PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers
Figure 4 for PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers
Viaarxiv icon

Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation

Jun 27, 2023
Haitao Tang, Yu Fu, Lei Sun, Jiabin Xue, Dan Liu, Yongchao Li, Zhiqiang Ma, Minghui Wu, Jia Pan, Genshun Wan, Ming'en Zhao

Figure 1 for Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation
Figure 2 for Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation
Figure 3 for Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation
Figure 4 for Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation
Viaarxiv icon

Using a Large Language Model to Control Speaking Style for Expressive TTS

Add code
Bookmark button
Alert button
May 17, 2023
Atli Thor Sigurgeirsson, Simon King

Figure 1 for Using a Large Language Model to Control Speaking Style for Expressive TTS
Figure 2 for Using a Large Language Model to Control Speaking Style for Expressive TTS
Figure 3 for Using a Large Language Model to Control Speaking Style for Expressive TTS
Figure 4 for Using a Large Language Model to Control Speaking Style for Expressive TTS
Viaarxiv icon

Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition

Feb 16, 2023
Minsu Kim, Hyung-Il Kim, Yong Man Ro

Figure 1 for Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
Figure 2 for Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
Figure 3 for Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
Figure 4 for Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
Viaarxiv icon

I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

Mar 14, 2023
Yifan Peng, Jaesong Lee, Shinji Watanabe

Figure 1 for I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Figure 2 for I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Figure 3 for I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Figure 4 for I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Viaarxiv icon