Picture for Yuxuan Hu

Yuxuan Hu

Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model

Add code
Jun 04, 2025
Viaarxiv icon

Phi-Omni-ST: A multimodal language model for direct speech-to-speech translation

Add code
Jun 04, 2025
Viaarxiv icon

MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation

Add code
May 23, 2025
Viaarxiv icon

MatchDance: Collaborative Mamba-Transformer Architecture Matching for High-Quality 3D Dance Synthesis

Add code
May 21, 2025
Viaarxiv icon

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis

Add code
Apr 14, 2025
Viaarxiv icon

QUAD: Quantization and Parameter-Efficient Tuning of LLM with Activation Decomposition

Add code
Mar 25, 2025
Viaarxiv icon

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Add code
Mar 03, 2025
Viaarxiv icon

LoRS: Efficient Low-Rank Adaptation for Sparse Large Language Model

Add code
Jan 15, 2025
Viaarxiv icon

Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers

Add code
Dec 23, 2024
Viaarxiv icon

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training

Add code
Dec 20, 2024
Viaarxiv icon