Picture for Hang Xu

Hang Xu

Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution

Add code
Jun 15, 2025
Figure 1 for Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution
Figure 2 for Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution
Figure 3 for Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution
Figure 4 for Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution
Viaarxiv icon

Semantic-decoupled Spatial Partition Guided Point-supervised Oriented Object Detection

Add code
Jun 12, 2025
Viaarxiv icon

Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs

Add code
Jun 06, 2025
Figure 1 for Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs
Figure 2 for Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs
Figure 3 for Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs
Figure 4 for Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs
Viaarxiv icon

SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning

Add code
May 25, 2025
Viaarxiv icon

CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback

Add code
Apr 28, 2025
Figure 1 for CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback
Figure 2 for CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback
Figure 3 for CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback
Figure 4 for CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback
Viaarxiv icon

PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning

Add code
Apr 08, 2025
Figure 1 for PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
Figure 2 for PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
Figure 3 for PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
Figure 4 for PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
Viaarxiv icon

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

Add code
Apr 03, 2025
Figure 1 for ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Figure 2 for ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Figure 3 for ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Figure 4 for ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Viaarxiv icon

From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D

Add code
Mar 29, 2025
Figure 1 for From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
Figure 2 for From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
Figure 3 for From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
Figure 4 for From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
Viaarxiv icon

DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation

Add code
Mar 27, 2025
Viaarxiv icon

EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation

Add code
Mar 20, 2025
Viaarxiv icon