Picture for Jian Luan

Jian Luan

Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition

Add code
Sep 19, 2025
Viaarxiv icon

BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

Add code
Sep 19, 2025
Viaarxiv icon

Lightweight speech enhancement guided target speech extraction in noisy multi-speaker scenarios

Add code
Aug 27, 2025
Viaarxiv icon

Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle

Add code
Aug 07, 2025
Viaarxiv icon

Attention Basin: Why Contextual Position Matters in Large Language Models

Add code
Aug 07, 2025
Viaarxiv icon

MiDashengLM: Efficient Audio Understanding with General Audio Captions

Add code
Aug 06, 2025
Figure 1 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Figure 2 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Figure 3 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Figure 4 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Viaarxiv icon

Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders

Add code
Jun 13, 2025
Viaarxiv icon

GLAP: General contrastive audio-text pretraining across domains and languages

Add code
Jun 12, 2025
Viaarxiv icon

BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism

Add code
May 27, 2025
Figure 1 for BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism
Figure 2 for BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism
Figure 3 for BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism
Figure 4 for BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism
Viaarxiv icon

TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization

Add code
May 26, 2025
Viaarxiv icon