Picture for Letian Wang

Letian Wang

Grounded World Model for Semantically Generalizable Planning

Add code
Apr 13, 2026
Viaarxiv icon

LMGenDrive: Bridging Multimodal Understanding and Generative World Modeling for End-to-End Driving

Add code
Apr 09, 2026
Viaarxiv icon

DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning

Add code
Apr 02, 2026
Viaarxiv icon

THFM: A Unified Video Foundation Model for 4D Human Perception and Beyond

Add code
Mar 26, 2026
Viaarxiv icon

SCATR: Mitigating New Instance Suppression in LiDAR-based Tracking-by-Attention via Second Chance Assignment and Track Query Dropout

Add code
Mar 02, 2026
Viaarxiv icon

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

Add code
Jan 04, 2026
Viaarxiv icon

OmniGen: Unified Multimodal Sensor Generation for Autonomous Driving

Add code
Dec 16, 2025
Viaarxiv icon

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

Add code
Nov 18, 2025
Viaarxiv icon

ForeSight: Multi-View Streaming Joint Object Detection and Trajectory Forecasting

Add code
Aug 09, 2025
Figure 1 for ForeSight: Multi-View Streaming Joint Object Detection and Trajectory Forecasting
Figure 2 for ForeSight: Multi-View Streaming Joint Object Detection and Trajectory Forecasting
Figure 3 for ForeSight: Multi-View Streaming Joint Object Detection and Trajectory Forecasting
Figure 4 for ForeSight: Multi-View Streaming Joint Object Detection and Trajectory Forecasting
Viaarxiv icon

OpenNav: Open-World Navigation with Multimodal Large Language Models

Add code
Jul 24, 2025
Figure 1 for OpenNav: Open-World Navigation with Multimodal Large Language Models
Figure 2 for OpenNav: Open-World Navigation with Multimodal Large Language Models
Figure 3 for OpenNav: Open-World Navigation with Multimodal Large Language Models
Figure 4 for OpenNav: Open-World Navigation with Multimodal Large Language Models
Viaarxiv icon