Picture for Xiawu Zheng

Xiawu Zheng

SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models

Add code
Mar 17, 2026
Viaarxiv icon

Event-Anchored Frame Selection for Effective Long-Video Understanding

Add code
Mar 01, 2026
Viaarxiv icon

Wavelet-based Frame Selection by Detecting Semantic Boundary for Long Video Understanding

Add code
Feb 28, 2026
Viaarxiv icon

Flow caching for autoregressive video generation

Add code
Feb 11, 2026
Viaarxiv icon

Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

Add code
Nov 19, 2025
Figure 1 for Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Figure 2 for Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Figure 3 for Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Figure 4 for Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Viaarxiv icon

Polybasic Speculative Decoding Through a Theoretical Perspective

Add code
Oct 30, 2025
Figure 1 for Polybasic Speculative Decoding Through a Theoretical Perspective
Figure 2 for Polybasic Speculative Decoding Through a Theoretical Perspective
Figure 3 for Polybasic Speculative Decoding Through a Theoretical Perspective
Figure 4 for Polybasic Speculative Decoding Through a Theoretical Perspective
Viaarxiv icon

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

Add code
Jul 30, 2025
Viaarxiv icon

Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs

Add code
May 28, 2025
Viaarxiv icon

Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective

Add code
May 28, 2025
Figure 1 for Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective
Figure 2 for Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective
Figure 3 for Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective
Figure 4 for Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective
Viaarxiv icon

QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension

Add code
Mar 11, 2025
Figure 1 for QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension
Figure 2 for QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension
Figure 3 for QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension
Figure 4 for QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension
Viaarxiv icon