Picture for Jiabing Yang

Jiabing Yang

Improving Vision-Language-Action Model Fine-Tuning with Structured Stage and Keyframe Supervision

Add code
Jun 25, 2026
Viaarxiv icon

When Seeing Is Not Believing -- A Benchmark for Search-Grounded Video Misinformation Detection

Add code
Jun 02, 2026
Viaarxiv icon

SKIP: Sparse Keyframe Interpolation Paradigm for Efficient Embodied World Models

Add code
May 30, 2026
Viaarxiv icon

Multi-View Video Diffusion Policy: A 3D Spatio-Temporal-Aware Video Action Model

Add code
Apr 03, 2026
Viaarxiv icon

Beyond Closed-Pool Video Retrieval: A Benchmark and Agent Framework for Real-World Video Search and Moment Localization

Add code
Feb 10, 2026
Viaarxiv icon

PaperX: A Unified Framework for Multimodal Academic Presentation Generation with Scholar DAG

Add code
Feb 05, 2026
Viaarxiv icon

BridgeV2W: Bridging Video Generation Models to Embodied World Models via Embodiment Masks

Add code
Feb 03, 2026
Viaarxiv icon

ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search

Add code
Jan 30, 2026
Viaarxiv icon

ToolWeaver: Weaving Collaborative Semantics for Scalable Tool Use in Large Language Models

Add code
Jan 29, 2026
Viaarxiv icon

AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs

Add code
Oct 08, 2025
Figure 1 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 2 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 3 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 4 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Viaarxiv icon