Picture for Han Lin

Han Lin

Exploring MLLM-Diffusion Information Transfer with MetaCanvas

Add code
Dec 12, 2025
Viaarxiv icon

Error-Driven Scene Editing for 3D Grounding in Large Language Models

Add code
Nov 18, 2025
Viaarxiv icon

EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance

Add code
May 28, 2025
Viaarxiv icon

Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization

Add code
Apr 11, 2025
Viaarxiv icon

DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation

Add code
Nov 25, 2024
Figure 1 for DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation
Figure 2 for DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation
Figure 3 for DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation
Figure 4 for DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation
Viaarxiv icon

VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning

Add code
Oct 04, 2024
Figure 1 for VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Figure 2 for VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Figure 3 for VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Figure 4 for VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Viaarxiv icon

Fast Tree-Field Integrators: From Low Displacement Rank to Topological Transformers

Add code
Jun 22, 2024
Viaarxiv icon

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Add code
Apr 24, 2024
Figure 1 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Figure 2 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Figure 3 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Figure 4 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Viaarxiv icon

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Add code
Apr 15, 2024
Figure 1 for Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Figure 2 for Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Figure 3 for Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Figure 4 for Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Viaarxiv icon

EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents

Add code
Mar 18, 2024
Figure 1 for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents
Figure 2 for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents
Figure 3 for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents
Figure 4 for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents
Viaarxiv icon