Picture for Jinyu Li

Jinyu Li

Fred

Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios

Add code
Jun 17, 2025
Viaarxiv icon

StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling

Add code
Jun 14, 2025
Viaarxiv icon

Discrete Audio Tokens: More Than a Survey!

Add code
Jun 12, 2025
Viaarxiv icon

PHRASED: Phrase Dictionary Biasing for Speech Translation

Add code
Jun 10, 2025
Viaarxiv icon

Lightweight Prompt Biasing for Contextualized End-to-End ASR Systems

Add code
Jun 06, 2025
Viaarxiv icon

Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model

Add code
Jun 04, 2025
Viaarxiv icon

Phi-Omni-ST: A multimodal language model for direct speech-to-speech translation

Add code
Jun 04, 2025
Viaarxiv icon

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis

Add code
Apr 14, 2025
Viaarxiv icon

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Add code
Mar 03, 2025
Viaarxiv icon

Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning

Add code
Feb 19, 2025
Viaarxiv icon