Picture for Yu Deng

Yu Deng

IBM

HiSpatial: Taming Hierarchical 3D Spatial Understanding in Vision-Language Models

Add code
Mar 26, 2026
Viaarxiv icon

A Context Engineering Framework for Improving Enterprise AI Agents based on Digital-Twin MDP

Add code
Mar 23, 2026
Viaarxiv icon

Robot-DIFT: Distilling Diffusion Features for Geometrically Consistent Visuomotor Control

Add code
Feb 12, 2026
Viaarxiv icon

Think Locally, Explain Globally: Graph-Guided LLM Investigations via Local Reasoning and Belief Propagation

Add code
Jan 25, 2026
Viaarxiv icon

LLM-powered Real-time Patent Citation Recommendation for Financial Technologies

Add code
Jan 23, 2026
Viaarxiv icon

VASA-3D: Lifelike Audio-Driven Gaussian Head Avatars from a Single Image

Add code
Dec 16, 2025
Viaarxiv icon

Native and Compact Structured Latents for 3D Generation

Add code
Dec 16, 2025
Viaarxiv icon

Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision

Add code
Nov 13, 2025
Figure 1 for Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision
Figure 2 for Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision
Figure 3 for Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision
Figure 4 for Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision
Viaarxiv icon

STORM: Segment, Track, and Object Re-Localization from a Single 3D Model

Add code
Nov 12, 2025
Viaarxiv icon

Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos

Add code
Oct 24, 2025
Figure 1 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Figure 2 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Figure 3 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Figure 4 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Viaarxiv icon