Picture for Xinhao Li

Xinhao Li

Kimi K2.5: Visual Agentic Intelligence

Add code
Feb 02, 2026
Viaarxiv icon

LongVPO: From Anchored Cues to Self-Reasoning for Long-Form Video Preference Optimization

Add code
Feb 02, 2026
Viaarxiv icon

Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning

Add code
Jan 30, 2026
Viaarxiv icon

Learning to Discover at Test Time

Add code
Jan 22, 2026
Viaarxiv icon

End-to-End Test-Time Training for Long Context

Add code
Dec 31, 2025
Viaarxiv icon

TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs

Add code
Dec 16, 2025
Viaarxiv icon

Diving into Mitigating Hallucinations from a Vision Perspective for Large Vision-Language Models

Add code
Sep 17, 2025
Figure 1 for Diving into Mitigating Hallucinations from a Vision Perspective for Large Vision-Language Models
Figure 2 for Diving into Mitigating Hallucinations from a Vision Perspective for Large Vision-Language Models
Figure 3 for Diving into Mitigating Hallucinations from a Vision Perspective for Large Vision-Language Models
Figure 4 for Diving into Mitigating Hallucinations from a Vision Perspective for Large Vision-Language Models
Viaarxiv icon

VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?

Add code
May 29, 2025
Viaarxiv icon

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

Add code
Apr 10, 2025
Viaarxiv icon

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling

Add code
Jan 21, 2025
Figure 1 for InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Figure 2 for InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Figure 3 for InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Figure 4 for InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Viaarxiv icon