Picture for Yanwei Li

Yanwei Li

Watch, Remember, Reason: Human-View Video Understanding with MLLMs

Add code
Jun 05, 2026
Viaarxiv icon

Benchmarking and Evolving Reason-Reflect-Rectify for Reflective Visual Generation

Add code
May 19, 2026
Viaarxiv icon

Semantic Generative Tuning for Unified Multimodal Models

Add code
May 18, 2026
Viaarxiv icon

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

Add code
Apr 24, 2026
Viaarxiv icon

One Loss to Rule Them All: Marked Time-to-Event for Structured EHR Foundation Models

Add code
Jan 31, 2026
Viaarxiv icon

Visual Spatial Tuning

Add code
Nov 07, 2025
Viaarxiv icon

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Add code
Oct 22, 2025
Viaarxiv icon

Aligning Effective Tokens with Video Anomaly in Large Language Models

Add code
Aug 08, 2025
Viaarxiv icon

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

Add code
May 30, 2025
Figure 1 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Figure 2 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Figure 3 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Figure 4 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Viaarxiv icon

FoMoH: A clinically meaningful foundation model evaluation for structured electronic health records

Add code
May 22, 2025
Viaarxiv icon