Picture for Limin Wang

Limin Wang

LongVPO: From Anchored Cues to Self-Reasoning for Long-Form Video Preference Optimization

Add code
Feb 02, 2026
Viaarxiv icon

GLAD: Generative Language-Assisted Visual Tracking for Low-Semantic Templates

Add code
Jan 31, 2026
Viaarxiv icon

Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning

Add code
Jan 30, 2026
Viaarxiv icon

VMonarch: Efficient Video Diffusion Transformers with Structured Attention

Add code
Jan 29, 2026
Viaarxiv icon

Towards Pixel-Level VLM Perception via Simple Points Prediction

Add code
Jan 27, 2026
Viaarxiv icon

TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs

Add code
Dec 16, 2025
Viaarxiv icon

SAM 2++: Tracking Anything at Any Granularity

Add code
Oct 22, 2025
Viaarxiv icon

Arbitrary Generative Video Interpolation

Add code
Oct 01, 2025
Viaarxiv icon

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Add code
Aug 25, 2025
Figure 1 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 2 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 3 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 4 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Viaarxiv icon

MobileViCLIP: An Efficient Video-Text Model for Mobile Devices

Add code
Aug 10, 2025
Viaarxiv icon