Picture for Jianke Zhu

Jianke Zhu

RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

Add code
Oct 02, 2025
Viaarxiv icon

MambaMap: Online Vectorized HD Map Construction using State Space Model

Add code
Jul 27, 2025
Viaarxiv icon

SAM4D: Segment Anything in Camera and LiDAR Streams

Add code
Jun 26, 2025
Viaarxiv icon

OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation

Add code
Jun 23, 2025
Viaarxiv icon

PixelThink: Towards Efficient Chain-of-Pixel Reasoning

Add code
May 29, 2025
Viaarxiv icon

Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps

Add code
May 24, 2025
Viaarxiv icon

DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models

Add code
Apr 25, 2025
Viaarxiv icon

PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning

Add code
Apr 22, 2025
Figure 1 for PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
Figure 2 for PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
Figure 3 for PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
Figure 4 for PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
Viaarxiv icon

Uncertainty-Instructed Structure Injection for Generalizable HD Map Construction

Add code
Mar 29, 2025
Viaarxiv icon

HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation

Add code
Feb 10, 2025
Figure 1 for HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation
Figure 2 for HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation
Figure 3 for HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation
Figure 4 for HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation
Viaarxiv icon