Picture for Dan Xu

Dan Xu

N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models

Add code
Dec 18, 2025
Viaarxiv icon

TSkel-Mamba: Temporal Dynamic Modeling via State Space Model for Human Skeleton-based Action Recognition

Add code
Dec 12, 2025
Viaarxiv icon

Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

Add code
Dec 11, 2025
Viaarxiv icon

FullPart: Generating each 3D Part at Full Resolution

Add code
Oct 30, 2025
Viaarxiv icon

CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving

Add code
Oct 09, 2025
Viaarxiv icon

Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction

Add code
Oct 06, 2025
Figure 1 for Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction
Figure 2 for Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction
Figure 3 for Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction
Figure 4 for Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction
Viaarxiv icon

Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning

Add code
Jul 28, 2025
Viaarxiv icon

UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation

Add code
Jul 03, 2025
Viaarxiv icon

HAMF: A Hybrid Attention-Mamba Framework for Joint Scene Context Understanding and Future Motion Representation Learning

Add code
May 21, 2025
Viaarxiv icon

Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields

Add code
May 04, 2025
Viaarxiv icon