Picture for Zhidong Deng

Zhidong Deng

Beyond Language: Grounding Referring Expressions with Hand Pointing in Egocentric Vision

Add code
Mar 27, 2026
Viaarxiv icon

Exploring Timeline Control for Facial Motion Generation

Add code
May 27, 2025
Viaarxiv icon

PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection

Add code
Oct 10, 2024
Figure 1 for PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection
Figure 2 for PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection
Figure 3 for PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection
Figure 4 for PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection
Viaarxiv icon

StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads

Add code
Sep 14, 2024
Figure 1 for StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads
Figure 2 for StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads
Figure 3 for StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads
Figure 4 for StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads
Viaarxiv icon

LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in Vision-Language Models

Add code
Aug 30, 2024
Figure 1 for LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in Vision-Language Models
Figure 2 for LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in Vision-Language Models
Figure 3 for LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in Vision-Language Models
Figure 4 for LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in Vision-Language Models
Viaarxiv icon

Video-CCAM: Enhancing Video-Language Understanding with Causal Cross-Attention Masks for Short and Long Videos

Add code
Aug 26, 2024
Viaarxiv icon

Unifying 3D Vision-Language Understanding via Promptable Queries

Add code
May 19, 2024
Figure 1 for Unifying 3D Vision-Language Understanding via Promptable Queries
Figure 2 for Unifying 3D Vision-Language Understanding via Promptable Queries
Figure 3 for Unifying 3D Vision-Language Understanding via Promptable Queries
Figure 4 for Unifying 3D Vision-Language Understanding via Promptable Queries
Viaarxiv icon

Improving Detection in Aerial Images by Capturing Inter-Object Relationships

Add code
Apr 05, 2024
Figure 1 for Improving Detection in Aerial Images by Capturing Inter-Object Relationships
Figure 2 for Improving Detection in Aerial Images by Capturing Inter-Object Relationships
Figure 3 for Improving Detection in Aerial Images by Capturing Inter-Object Relationships
Figure 4 for Improving Detection in Aerial Images by Capturing Inter-Object Relationships
Viaarxiv icon

DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Add code
Dec 15, 2023
Viaarxiv icon

Feedback RoI Features Improve Aerial Object Detection

Add code
Nov 28, 2023
Figure 1 for Feedback RoI Features Improve Aerial Object Detection
Figure 2 for Feedback RoI Features Improve Aerial Object Detection
Figure 3 for Feedback RoI Features Improve Aerial Object Detection
Figure 4 for Feedback RoI Features Improve Aerial Object Detection
Viaarxiv icon