Picture for Zhuowei Li

Zhuowei Li

VisRef: Visual Refocusing while Thinking Improves Test-Time Scaling in Multi-Modal Large Reasoning Models

Add code
Feb 27, 2026
Viaarxiv icon

Decoupling Vision and Language: Codebook Anchored Visual Adaptation

Add code
Feb 23, 2026
Viaarxiv icon

T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization

Add code
Feb 13, 2026
Viaarxiv icon

Token-Level Uncertainty Estimation for Large Language Model Reasoning

Add code
May 16, 2025
Viaarxiv icon

Optimal Transport-Guided Source-Free Adaptation for Face Anti-Spoofing

Add code
Mar 29, 2025
Viaarxiv icon

Show and Segment: Universal Medical Image Segmentation via In-Context Learning

Add code
Mar 25, 2025
Viaarxiv icon

The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering

Add code
Feb 05, 2025
Figure 1 for The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering
Figure 2 for The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering
Figure 3 for The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering
Figure 4 for The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering
Viaarxiv icon

MLLM-as-a-Judge for Image Safety without Human Labeling

Add code
Dec 31, 2024
Figure 1 for MLLM-as-a-Judge for Image Safety without Human Labeling
Figure 2 for MLLM-as-a-Judge for Image Safety without Human Labeling
Figure 3 for MLLM-as-a-Judge for Image Safety without Human Labeling
Figure 4 for MLLM-as-a-Judge for Image Safety without Human Labeling
Viaarxiv icon

Implicit In-context Learning

Add code
May 23, 2024
Figure 1 for Implicit In-context Learning
Figure 2 for Implicit In-context Learning
Figure 3 for Implicit In-context Learning
Figure 4 for Implicit In-context Learning
Viaarxiv icon

GAgent: An Adaptive Rigid-Soft Gripping Agent with Vision Language Models for Complex Lighting Environments

Add code
Mar 16, 2024
Figure 1 for GAgent: An Adaptive Rigid-Soft Gripping Agent with Vision Language Models for Complex Lighting Environments
Figure 2 for GAgent: An Adaptive Rigid-Soft Gripping Agent with Vision Language Models for Complex Lighting Environments
Figure 3 for GAgent: An Adaptive Rigid-Soft Gripping Agent with Vision Language Models for Complex Lighting Environments
Figure 4 for GAgent: An Adaptive Rigid-Soft Gripping Agent with Vision Language Models for Complex Lighting Environments
Viaarxiv icon