Picture for Jinyu Li

Jinyu Li

Beijing Institute of Technology, China

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Add code
Aug 14, 2023
Viaarxiv icon

Pre-training End-to-end ASR Models with Augmented Speech Samples Queried by Text

Add code
Jul 30, 2023
Figure 1 for Pre-training End-to-end ASR Models with Augmented Speech Samples Queried by Text
Figure 2 for Pre-training End-to-end ASR Models with Augmented Speech Samples Queried by Text
Figure 3 for Pre-training End-to-end ASR Models with Augmented Speech Samples Queried by Text
Figure 4 for Pre-training End-to-end ASR Models with Augmented Speech Samples Queried by Text
Viaarxiv icon

On decoder-only architecture for speech-to-text and large language model integration

Add code
Jul 14, 2023
Viaarxiv icon

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

Add code
Jul 07, 2023
Figure 1 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 2 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 3 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 4 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Viaarxiv icon

Accelerating Transducers through Adjacent Token Merging

Add code
Jun 28, 2023
Figure 1 for Accelerating Transducers through Adjacent Token Merging
Figure 2 for Accelerating Transducers through Adjacent Token Merging
Figure 3 for Accelerating Transducers through Adjacent Token Merging
Figure 4 for Accelerating Transducers through Adjacent Token Merging
Viaarxiv icon

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

Add code
Jun 28, 2023
Viaarxiv icon

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Add code
May 31, 2023
Figure 1 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Figure 2 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Figure 3 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Figure 4 for Accurate and Structured Pruning for Efficient Automatic Speech Recognition
Viaarxiv icon

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

Add code
May 25, 2023
Figure 1 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Figure 2 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Figure 3 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Figure 4 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Viaarxiv icon

PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds

Add code
May 08, 2023
Viaarxiv icon

Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

Add code
Mar 07, 2023
Viaarxiv icon