Picture for Jinyu Li

Jinyu Li

Fred

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

Add code
Jan 05, 2023
Figure 1 for Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Figure 2 for Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Figure 3 for Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Figure 4 for Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Viaarxiv icon

Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models

Add code
Dec 05, 2022
Viaarxiv icon

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

Add code
Nov 21, 2022
Figure 1 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Figure 2 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Figure 3 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Figure 4 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Viaarxiv icon

Simulating realistic speech overlaps improves multi-talker ASR

Add code
Nov 17, 2022
Viaarxiv icon

LongFNT: Long-form Speech Recognition with Factorized Neural Transducer

Add code
Nov 17, 2022
Viaarxiv icon

Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition

Add code
Nov 10, 2022
Viaarxiv icon

Speech separation with large-scale self-supervised learning

Add code
Nov 09, 2022
Viaarxiv icon

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

Add code
Nov 07, 2022
Figure 1 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Figure 2 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Figure 3 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Figure 4 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Viaarxiv icon

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

Add code
Nov 05, 2022
Figure 1 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Figure 2 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Figure 3 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Figure 4 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Viaarxiv icon

A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability

Add code
Nov 04, 2022
Viaarxiv icon