Alert button
Picture for Jinyu Li

Jinyu Li

Alert button

Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition

Nov 10, 2022
Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang

Figure 1 for Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Figure 2 for Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Figure 3 for Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Figure 4 for Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Viaarxiv icon

Speech separation with large-scale self-supervised learning

Nov 09, 2022
Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez

Figure 1 for Speech separation with large-scale self-supervised learning
Figure 2 for Speech separation with large-scale self-supervised learning
Figure 3 for Speech separation with large-scale self-supervised learning
Figure 4 for Speech separation with large-scale self-supervised learning
Viaarxiv icon

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

Nov 07, 2022
Yashesh Gaur, Nick Kibre, Jian Xue, Kangyuan Shu, Yuhui Wang, Issac Alphanso, Jinyu Li, Yifan Gong

Figure 1 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Figure 2 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Figure 3 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Figure 4 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Viaarxiv icon

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

Nov 05, 2022
Peidong Wang, Eric Sun, Jian Xue, Yu Wu, Long Zhou, Yashesh Gaur, Shujie Liu, Jinyu Li

Figure 1 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Figure 2 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Figure 3 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Figure 4 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Viaarxiv icon

A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability

Nov 04, 2022
Jian Xue, Peidong Wang, Jinyu Li, Eric Sun

Figure 1 for A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability
Figure 2 for A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability
Figure 3 for A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability
Figure 4 for A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability
Viaarxiv icon

Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation

Oct 31, 2022
Kun Wei, Long Zhou, Ziqiang Zhang, Liping Chen, Shujie Liu, Lei He, Jinyu Li, Furu Wei

Figure 1 for Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation
Figure 2 for Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation
Figure 3 for Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation
Figure 4 for Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation
Viaarxiv icon

Simulating realistic speech overlaps improves multi-talker ASR

Oct 27, 2022
Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka

Figure 1 for Simulating realistic speech overlaps improves multi-talker ASR
Figure 2 for Simulating realistic speech overlaps improves multi-talker ASR
Figure 3 for Simulating realistic speech overlaps improves multi-talker ASR
Figure 4 for Simulating realistic speech overlaps improves multi-talker ASR
Viaarxiv icon

Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding

Oct 16, 2022
Ruchao Fan, Guoli Ye, Yashesh Gaur, Jinyu Li

Figure 1 for Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Figure 2 for Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Figure 3 for Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Figure 4 for Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Viaarxiv icon

CTCBERT: Advancing Hidden-unit BERT with CTC Objectives

Oct 16, 2022
Ruchao Fan, Yiming Wang, Yashesh Gaur, Jinyu Li

Figure 1 for CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Figure 2 for CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Figure 3 for CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Figure 4 for CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Viaarxiv icon

SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training

Oct 07, 2022
Ziqiang Zhang, Long Zhou, Junyi Ao, Shujie Liu, Lirong Dai, Jinyu Li, Furu Wei

Figure 1 for SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Figure 2 for SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Figure 3 for SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Figure 4 for SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Viaarxiv icon