Picture for Xinhao Li

Xinhao Li

End-to-End Test-Time Training for Long Context

Add code
Dec 31, 2025
Viaarxiv icon

TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs

Add code
Dec 16, 2025
Viaarxiv icon

Diving into Mitigating Hallucinations from a Vision Perspective for Large Vision-Language Models

Add code
Sep 17, 2025
Figure 1 for Diving into Mitigating Hallucinations from a Vision Perspective for Large Vision-Language Models
Figure 2 for Diving into Mitigating Hallucinations from a Vision Perspective for Large Vision-Language Models
Figure 3 for Diving into Mitigating Hallucinations from a Vision Perspective for Large Vision-Language Models
Figure 4 for Diving into Mitigating Hallucinations from a Vision Perspective for Large Vision-Language Models
Viaarxiv icon

VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?

Add code
May 29, 2025
Viaarxiv icon

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

Add code
Apr 10, 2025
Viaarxiv icon

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling

Add code
Jan 21, 2025
Figure 1 for InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Figure 2 for InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Figure 3 for InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Figure 4 for InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Viaarxiv icon

Fine-grained Video-Text Retrieval: A New Benchmark and Method

Add code
Dec 31, 2024
Viaarxiv icon

Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method

Add code
Dec 31, 2024
Figure 1 for Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method
Figure 2 for Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method
Figure 3 for Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method
Figure 4 for Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method
Viaarxiv icon

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Add code
Dec 31, 2024
Figure 1 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Figure 2 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Figure 3 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Figure 4 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Viaarxiv icon

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Add code
Dec 26, 2024
Viaarxiv icon