Picture for Jian Luan

Jian Luan

ACAVCaps: Enabling large-scale training for fine-grained and diverse audio understanding

Add code
Mar 25, 2026
Viaarxiv icon

The Interspeech 2026 Audio Encoder Capability Challenge for Large Audio Language Models

Add code
Mar 24, 2026
Viaarxiv icon

Borderless Long Speech Synthesis

Add code
Mar 20, 2026
Viaarxiv icon

ExPosST: Explicit Positioning with Adaptive Masking for LLM-Based Simultaneous Machine Translation

Add code
Mar 16, 2026
Viaarxiv icon

Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

Add code
Mar 12, 2026
Viaarxiv icon

IMTBench: A Multi-Scenario Cross-Modal Collaborative Evaluation Benchmark for In-Image Machine Translation

Add code
Mar 11, 2026
Viaarxiv icon

From Ideal to Real: Stable Video Object Removal under Imperfect Conditions

Add code
Mar 10, 2026
Viaarxiv icon

EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models

Add code
Feb 27, 2026
Viaarxiv icon

CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning

Add code
Feb 27, 2026
Viaarxiv icon

DashengTokenizer: One layer is enough for unified audio understanding and generation

Add code
Feb 27, 2026
Viaarxiv icon