Picture for Zhihao Yuan

Zhihao Yuan

See the Forest and the Trees: A Synergistic Reasoning Framework for Knowledge-Based Visual Question Answering

Add code
Jul 23, 2025
Viaarxiv icon

Empowering Large Language Models with 3D Situation Awareness

Add code
Mar 29, 2025
Viaarxiv icon

PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models

Add code
Mar 13, 2025
Viaarxiv icon

Generative Semantic Communication for Text-to-Speech Synthesis

Add code
Oct 04, 2024
Viaarxiv icon

Instance-free Text to Point Cloud Localization with Relative Position Awareness

Add code
Apr 27, 2024
Viaarxiv icon

GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance

Add code
Dec 12, 2023
Figure 1 for GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance
Figure 2 for GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance
Figure 3 for GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance
Figure 4 for GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance
Viaarxiv icon

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

Add code
Nov 26, 2023
Viaarxiv icon

Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases

Add code
Jul 05, 2022
Figure 1 for Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases
Figure 2 for Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases
Figure 3 for Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases
Figure 4 for Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases
Viaarxiv icon

X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

Add code
Apr 06, 2022
Figure 1 for X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
Figure 2 for X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
Figure 3 for X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
Figure 4 for X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
Viaarxiv icon

CLEVR3D: Compositional Language and Elementary Visual Reasoning for Question Answering in 3D Real-World Scenes

Add code
Dec 31, 2021
Figure 1 for CLEVR3D: Compositional Language and Elementary Visual Reasoning for Question Answering in 3D Real-World Scenes
Figure 2 for CLEVR3D: Compositional Language and Elementary Visual Reasoning for Question Answering in 3D Real-World Scenes
Figure 3 for CLEVR3D: Compositional Language and Elementary Visual Reasoning for Question Answering in 3D Real-World Scenes
Figure 4 for CLEVR3D: Compositional Language and Elementary Visual Reasoning for Question Answering in 3D Real-World Scenes
Viaarxiv icon