Picture for Xiawu Zheng

Xiawu Zheng

Motion-Aware Caching for Efficient Autoregressive Video Generation

Add code
May 03, 2026
Viaarxiv icon

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Add code
Apr 06, 2026
Viaarxiv icon

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Add code
Mar 24, 2026
Viaarxiv icon

SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models

Add code
Mar 17, 2026
Viaarxiv icon

Event-Anchored Frame Selection for Effective Long-Video Understanding

Add code
Mar 01, 2026
Viaarxiv icon

Wavelet-based Frame Selection by Detecting Semantic Boundary for Long Video Understanding

Add code
Feb 28, 2026
Viaarxiv icon

Flow caching for autoregressive video generation

Add code
Feb 11, 2026
Viaarxiv icon

Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

Add code
Nov 19, 2025
Figure 1 for Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Figure 2 for Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Figure 3 for Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Figure 4 for Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Viaarxiv icon

Polybasic Speculative Decoding Through a Theoretical Perspective

Add code
Oct 30, 2025
Figure 1 for Polybasic Speculative Decoding Through a Theoretical Perspective
Figure 2 for Polybasic Speculative Decoding Through a Theoretical Perspective
Figure 3 for Polybasic Speculative Decoding Through a Theoretical Perspective
Figure 4 for Polybasic Speculative Decoding Through a Theoretical Perspective
Viaarxiv icon

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

Add code
Jul 30, 2025
Viaarxiv icon