Picture for Lu Sheng

Lu Sheng

ProGuard: Towards Proactive Multimodal Safeguard

Add code
Dec 29, 2025
Viaarxiv icon

Reasoning-Driven Amodal Completion: Collaborative Agents and Perceptual Evaluation

Add code
Dec 24, 2025
Figure 1 for Reasoning-Driven Amodal Completion: Collaborative Agents and Perceptual Evaluation
Figure 2 for Reasoning-Driven Amodal Completion: Collaborative Agents and Perceptual Evaluation
Figure 3 for Reasoning-Driven Amodal Completion: Collaborative Agents and Perceptual Evaluation
Figure 4 for Reasoning-Driven Amodal Completion: Collaborative Agents and Perceptual Evaluation
Viaarxiv icon

RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics

Add code
Dec 15, 2025
Viaarxiv icon

InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE

Add code
Nov 17, 2025
Figure 1 for InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE
Figure 2 for InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE
Figure 3 for InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE
Figure 4 for InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE
Viaarxiv icon

TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics

Add code
Oct 08, 2025
Figure 1 for TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
Figure 2 for TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
Figure 3 for TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
Figure 4 for TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
Viaarxiv icon

VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Add code
Aug 26, 2025
Figure 1 for VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
Figure 2 for VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
Figure 3 for VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
Figure 4 for VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
Viaarxiv icon

AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models

Add code
Jun 24, 2025
Viaarxiv icon

ELBO-T2IAlign: A Generic ELBO-Based Method for Calibrating Pixel-level Text-Image Alignment in Diffusion Models

Add code
Jun 11, 2025
Viaarxiv icon

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Add code
Jun 04, 2025
Viaarxiv icon

Personalize Anything for Free with Diffusion Transformer

Add code
Mar 16, 2025
Viaarxiv icon