Picture for Xuweiyi Chen

Xuweiyi Chen

Multi-Object Hallucination in Vision-Language Models

Add code
Jul 08, 2024
Viaarxiv icon

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Add code
Jun 12, 2024
Figure 1 for 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Figure 2 for 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Figure 3 for 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Figure 4 for 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Viaarxiv icon

3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs

Add code
Jun 07, 2024
Figure 1 for 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs
Figure 2 for 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs
Figure 3 for 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs
Figure 4 for 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs
Viaarxiv icon

UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control

Add code
Mar 06, 2024
Figure 1 for UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control
Figure 2 for UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control
Figure 3 for UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control
Figure 4 for UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control
Viaarxiv icon

LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent

Add code
Sep 21, 2023
Figure 1 for LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent
Figure 2 for LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent
Figure 3 for LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent
Figure 4 for LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent
Viaarxiv icon