Picture for Jianhua Han

Jianhua Han

C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning

Add code
Jul 22, 2025
Viaarxiv icon

Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs

Add code
Jun 06, 2025
Viaarxiv icon

SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning

Add code
May 25, 2025
Viaarxiv icon

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

Add code
Apr 03, 2025
Viaarxiv icon

SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation

Add code
Mar 09, 2025
Viaarxiv icon

Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?

Add code
Mar 08, 2025
Viaarxiv icon

TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba

Add code
Feb 21, 2025
Figure 1 for TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Figure 2 for TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Figure 3 for TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Figure 4 for TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Viaarxiv icon

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

Add code
Dec 09, 2024
Viaarxiv icon

AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning

Add code
Nov 18, 2024
Figure 1 for AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning
Figure 2 for AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning
Figure 3 for AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning
Figure 4 for AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning
Viaarxiv icon

VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation

Add code
Nov 14, 2024
Viaarxiv icon