Picture for Xiangyu Zeng

Xiangyu Zeng

VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs

Add code
Mar 24, 2026
Viaarxiv icon

HiCI: Hierarchical Construction-Integration for Long-Context Attention

Add code
Mar 21, 2026
Viaarxiv icon

RIVER: A Real-Time Interaction Benchmark for Video LLMs

Add code
Mar 04, 2026
Viaarxiv icon

Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning

Add code
Jan 30, 2026
Viaarxiv icon

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

Add code
Apr 10, 2025
Viaarxiv icon

Make Your Training Flexible: Towards Deployment-Efficient Video Models

Add code
Mar 18, 2025
Figure 1 for Make Your Training Flexible: Towards Deployment-Efficient Video Models
Figure 2 for Make Your Training Flexible: Towards Deployment-Efficient Video Models
Figure 3 for Make Your Training Flexible: Towards Deployment-Efficient Video Models
Figure 4 for Make Your Training Flexible: Towards Deployment-Efficient Video Models
Viaarxiv icon

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling

Add code
Jan 21, 2025
Figure 1 for InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Figure 2 for InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Figure 3 for InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Figure 4 for InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Viaarxiv icon

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Add code
Dec 31, 2024
Figure 1 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Figure 2 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Figure 3 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Figure 4 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Viaarxiv icon

Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method

Add code
Dec 31, 2024
Figure 1 for Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method
Figure 2 for Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method
Figure 3 for Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method
Figure 4 for Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method
Viaarxiv icon

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Add code
Dec 26, 2024
Viaarxiv icon