Picture for Jiankang Deng

Jiankang Deng

Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models

Add code
Dec 22, 2025
Figure 1 for Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models
Figure 2 for Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models
Figure 3 for Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models
Figure 4 for Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models
Viaarxiv icon

An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges

Add code
Dec 19, 2025
Viaarxiv icon

SATGround: A Spatially-Aware Approach for Visual Grounding in Remote Sensing

Add code
Dec 09, 2025
Viaarxiv icon

Reconstructing 3D Scenes in Native High Dynamic Range

Add code
Nov 17, 2025
Viaarxiv icon

Plug-and-Play Clarifier: A Zero-Shot Multimodal Framework for Egocentric Intent Disambiguation

Add code
Nov 12, 2025
Figure 1 for Plug-and-Play Clarifier: A Zero-Shot Multimodal Framework for Egocentric Intent Disambiguation
Figure 2 for Plug-and-Play Clarifier: A Zero-Shot Multimodal Framework for Egocentric Intent Disambiguation
Figure 3 for Plug-and-Play Clarifier: A Zero-Shot Multimodal Framework for Egocentric Intent Disambiguation
Figure 4 for Plug-and-Play Clarifier: A Zero-Shot Multimodal Framework for Egocentric Intent Disambiguation
Viaarxiv icon

RetouchLLM: Training-free White-box Image Retouching

Add code
Oct 09, 2025
Viaarxiv icon

Embodied Arena: A Comprehensive, Unified, and Evolving Evaluation Platform for Embodied AI

Add code
Sep 18, 2025
Viaarxiv icon

LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving

Add code
Aug 17, 2025
Figure 1 for LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving
Figure 2 for LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving
Figure 3 for LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving
Figure 4 for LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving
Viaarxiv icon

Unlocking the Potential of Diffusion Priors in Blind Face Restoration

Add code
Aug 12, 2025
Figure 1 for Unlocking the Potential of Diffusion Priors in Blind Face Restoration
Figure 2 for Unlocking the Potential of Diffusion Priors in Blind Face Restoration
Figure 3 for Unlocking the Potential of Diffusion Priors in Blind Face Restoration
Figure 4 for Unlocking the Potential of Diffusion Priors in Blind Face Restoration
Viaarxiv icon

Region-based Cluster Discrimination for Visual Representation Learning

Add code
Jul 26, 2025
Viaarxiv icon