Picture for Hao Li

Hao Li

Jack

Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets

Add code
May 12, 2025
Viaarxiv icon

SOAP: Style-Omniscient Animatable Portraits

Add code
May 08, 2025
Figure 1 for SOAP: Style-Omniscient Animatable Portraits
Figure 2 for SOAP: Style-Omniscient Animatable Portraits
Figure 3 for SOAP: Style-Omniscient Animatable Portraits
Figure 4 for SOAP: Style-Omniscient Animatable Portraits
Viaarxiv icon

Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding

Add code
May 08, 2025
Figure 1 for Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
Figure 2 for Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
Figure 3 for Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
Figure 4 for Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
Viaarxiv icon

Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World

Add code
May 07, 2025
Figure 1 for Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World
Figure 2 for Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World
Figure 3 for Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World
Figure 4 for Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World
Viaarxiv icon

A machine learning model for skillful climate system prediction

Add code
May 06, 2025
Viaarxiv icon

Optimization of Module Transferability in Single Image Super-Resolution: Universality Assessment and Cycle Residual Blocks

Add code
May 06, 2025
Viaarxiv icon

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Add code
May 01, 2025
Viaarxiv icon

RoboGround: Robotic Manipulation with Grounded Vision-Language Priors

Add code
Apr 30, 2025
Figure 1 for RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Figure 2 for RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Figure 3 for RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Figure 4 for RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Viaarxiv icon

TarDiff: Target-Oriented Diffusion Guidance for Synthetic Electronic Health Record Time Series Generation

Add code
Apr 24, 2025
Viaarxiv icon

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Add code
Apr 15, 2025
Viaarxiv icon