Picture for Shihao Wang

Shihao Wang

Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline

Add code
Mar 05, 2026
Viaarxiv icon

PhyCritic: Multimodal Critic Models for Physical AI

Add code
Feb 11, 2026
Viaarxiv icon

Learning Heat-based Equations in Self-similar variables

Add code
Feb 03, 2026
Viaarxiv icon

ProphetKV: User-Query-Driven Selective Recomputation for Efficient KV Cache Reuse in Retrieval-Augmented Generation

Add code
Jan 31, 2026
Viaarxiv icon

DP$^2$O-SR: Direct Perceptual Preference Optimization for Real-World Image Super-Resolution

Add code
Oct 21, 2025
Viaarxiv icon

VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding

Add code
Jul 17, 2025
Viaarxiv icon

Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

Add code
Apr 21, 2025
Viaarxiv icon

The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

Add code
Apr 14, 2025
Viaarxiv icon

OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning

Add code
Apr 06, 2025
Figure 1 for OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
Figure 2 for OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
Figure 3 for OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
Figure 4 for OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
Viaarxiv icon

Slow-Fast Architecture for Video Multi-Modal Large Language Models

Add code
Apr 02, 2025
Viaarxiv icon