Picture for Hengshuang Zhao

Hengshuang Zhao

Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection

Add code
Dec 18, 2025
Viaarxiv icon

In Pursuit of Pixel Supervision for Visual Pre-training

Add code
Dec 17, 2025
Viaarxiv icon

MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives

Add code
Dec 16, 2025
Viaarxiv icon

GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation

Add code
Dec 14, 2025
Viaarxiv icon

DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning

Add code
Dec 14, 2025
Viaarxiv icon

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Add code
Dec 09, 2025
Viaarxiv icon

Seg-VAR: Image Segmentation with Visual Autoregressive Modeling

Add code
Nov 16, 2025
Viaarxiv icon

Visual Spatial Tuning

Add code
Nov 07, 2025
Viaarxiv icon

From Noisy Traces to Stable Gradients: Bias-Variance Optimized Preference Optimization for Aligning Large Reasoning Models

Add code
Oct 06, 2025
Viaarxiv icon

DiffCamera: Arbitrary Refocusing on Images

Add code
Sep 30, 2025
Figure 1 for DiffCamera: Arbitrary Refocusing on Images
Figure 2 for DiffCamera: Arbitrary Refocusing on Images
Figure 3 for DiffCamera: Arbitrary Refocusing on Images
Figure 4 for DiffCamera: Arbitrary Refocusing on Images
Viaarxiv icon