Picture for Jinyu Li

Jinyu Li

Beijing Institute of Technology, China

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

Add code
Nov 07, 2022
Figure 1 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Figure 2 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Figure 3 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Figure 4 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Viaarxiv icon

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

Add code
Nov 05, 2022
Figure 1 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Figure 2 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Figure 3 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Figure 4 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Viaarxiv icon

A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability

Add code
Nov 04, 2022
Viaarxiv icon

Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation

Add code
Oct 31, 2022
Figure 1 for Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation
Figure 2 for Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation
Figure 3 for Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation
Figure 4 for Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation
Viaarxiv icon

CTCBERT: Advancing Hidden-unit BERT with CTC Objectives

Add code
Oct 16, 2022
Figure 1 for CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Figure 2 for CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Figure 3 for CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Figure 4 for CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Viaarxiv icon

Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding

Add code
Oct 16, 2022
Figure 1 for Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Figure 2 for Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Figure 3 for Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Figure 4 for Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Viaarxiv icon

SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training

Add code
Oct 07, 2022
Figure 1 for SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Figure 2 for SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Figure 3 for SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Figure 4 for SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Viaarxiv icon

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

Add code
Sep 30, 2022
Figure 1 for SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Figure 2 for SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Figure 3 for SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Figure 4 for SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Viaarxiv icon

VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition

Add code
Sep 12, 2022
Figure 1 for VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition
Figure 2 for VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition
Figure 3 for VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition
Figure 4 for VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition
Viaarxiv icon

DeFlowSLAM: Self-Supervised Scene Motion Decomposition for Dynamic Dense SLAM

Add code
Jul 18, 2022
Figure 1 for DeFlowSLAM: Self-Supervised Scene Motion Decomposition for Dynamic Dense SLAM
Figure 2 for DeFlowSLAM: Self-Supervised Scene Motion Decomposition for Dynamic Dense SLAM
Figure 3 for DeFlowSLAM: Self-Supervised Scene Motion Decomposition for Dynamic Dense SLAM
Figure 4 for DeFlowSLAM: Self-Supervised Scene Motion Decomposition for Dynamic Dense SLAM
Viaarxiv icon