Picture for Dave Zhenyu Chen

Dave Zhenyu Chen

Reliev3R: Relieving Feed-forward Reconstruction from Multi-View Geometric Annotations

Add code
Apr 01, 2026
Viaarxiv icon

GAP-MLLM: Geometry-Aligned Pre-training for Activating 3D Spatial Perception in Multimodal Large Language Models

Add code
Mar 17, 2026
Viaarxiv icon

VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection

Add code
Mar 01, 2026
Viaarxiv icon

Map2Thought: Explicit 3D Spatial Reasoning via Metric Cognitive Maps

Add code
Jan 16, 2026
Viaarxiv icon

Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs

Add code
Jun 06, 2025
Figure 1 for Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs
Figure 2 for Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs
Figure 3 for Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs
Figure 4 for Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs
Viaarxiv icon

Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs

Add code
Mar 07, 2025
Figure 1 for Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs
Figure 2 for Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs
Figure 3 for Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs
Figure 4 for Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs
Viaarxiv icon

When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

Add code
May 16, 2024
Figure 1 for When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Figure 2 for When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Figure 3 for When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Figure 4 for When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Viaarxiv icon

EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion

Add code
May 02, 2024
Viaarxiv icon

SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors

Add code
Nov 28, 2023
Figure 1 for SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
Figure 2 for SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
Figure 3 for SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
Figure 4 for SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
Viaarxiv icon

Generating Context-Aware Natural Answers for Questions in 3D Scenes

Add code
Oct 30, 2023
Figure 1 for Generating Context-Aware Natural Answers for Questions in 3D Scenes
Figure 2 for Generating Context-Aware Natural Answers for Questions in 3D Scenes
Figure 3 for Generating Context-Aware Natural Answers for Questions in 3D Scenes
Figure 4 for Generating Context-Aware Natural Answers for Questions in 3D Scenes
Viaarxiv icon