Picture for Jiankang Deng

Jiankang Deng

MaDiS: Taming Masked Diffusion Language Models for Sign Language Generation

Add code
Jan 27, 2026
Viaarxiv icon

Optimizing Multimodal LLMs for Egocentric Video Understanding: A Solution for the HD-EPIC VQA Challenge

Add code
Jan 15, 2026
Viaarxiv icon

Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models

Add code
Dec 22, 2025
Figure 1 for Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models
Figure 2 for Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models
Figure 3 for Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models
Figure 4 for Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models
Viaarxiv icon

An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges

Add code
Dec 19, 2025
Viaarxiv icon

SATGround: A Spatially-Aware Approach for Visual Grounding in Remote Sensing

Add code
Dec 09, 2025
Viaarxiv icon

Reconstructing 3D Scenes in Native High Dynamic Range

Add code
Nov 17, 2025
Viaarxiv icon

Plug-and-Play Clarifier: A Zero-Shot Multimodal Framework for Egocentric Intent Disambiguation

Add code
Nov 12, 2025
Figure 1 for Plug-and-Play Clarifier: A Zero-Shot Multimodal Framework for Egocentric Intent Disambiguation
Figure 2 for Plug-and-Play Clarifier: A Zero-Shot Multimodal Framework for Egocentric Intent Disambiguation
Figure 3 for Plug-and-Play Clarifier: A Zero-Shot Multimodal Framework for Egocentric Intent Disambiguation
Figure 4 for Plug-and-Play Clarifier: A Zero-Shot Multimodal Framework for Egocentric Intent Disambiguation
Viaarxiv icon

RetouchLLM: Training-free White-box Image Retouching

Add code
Oct 09, 2025
Viaarxiv icon

Embodied Arena: A Comprehensive, Unified, and Evolving Evaluation Platform for Embodied AI

Add code
Sep 18, 2025
Viaarxiv icon

LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving

Add code
Aug 17, 2025
Figure 1 for LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving
Figure 2 for LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving
Figure 3 for LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving
Figure 4 for LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving
Viaarxiv icon