Picture for Chenliang Xu

Chenliang Xu

ZeroSep: Separate Anything in Audio with Zero Training

Add code
May 29, 2025
Viaarxiv icon

BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models

Add code
May 28, 2025
Viaarxiv icon

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

Add code
May 26, 2025
Viaarxiv icon

$I^2G$: Generating Instructional Illustrations via Text-Conditioned Diffusion

Add code
May 22, 2025
Viaarxiv icon

Intentional Gesture: Deliver Your Intentions with Gestures for Speech

Add code
May 21, 2025
Viaarxiv icon

Learning to Highlight Audio by Watching Movies

Add code
May 17, 2025
Viaarxiv icon

The Sword of Damocles in ViTs: Computational Redundancy Amplifies Adversarial Transferability

Add code
Apr 15, 2025
Viaarxiv icon

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

Add code
Apr 09, 2025
Viaarxiv icon

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)

Add code
Apr 04, 2025
Viaarxiv icon

FreSca: Unveiling the Scaling Space in Diffusion Models

Add code
Apr 02, 2025
Viaarxiv icon