Picture for Qichao Zhang

Qichao Zhang

TakeAD: Preference-based Post-optimization for End-to-end Autonomous Driving with Expert Takeover Data

Add code
Dec 22, 2025
Figure 1 for TakeAD: Preference-based Post-optimization for End-to-end Autonomous Driving with Expert Takeover Data
Figure 2 for TakeAD: Preference-based Post-optimization for End-to-end Autonomous Driving with Expert Takeover Data
Figure 3 for TakeAD: Preference-based Post-optimization for End-to-end Autonomous Driving with Expert Takeover Data
Figure 4 for TakeAD: Preference-based Post-optimization for End-to-end Autonomous Driving with Expert Takeover Data
Viaarxiv icon

WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving

Add code
Dec 22, 2025
Viaarxiv icon

CriticSearch: Fine-Grained Credit Assignment for Search Agents via a Retrospective Critic

Add code
Nov 15, 2025
Viaarxiv icon

SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

Add code
Jun 24, 2025
Viaarxiv icon

ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving

Add code
May 26, 2025
Viaarxiv icon

Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL

Add code
May 16, 2025
Figure 1 for Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Figure 2 for Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Figure 3 for Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Figure 4 for Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Viaarxiv icon

UncAD: Towards Safe End-to-end Autonomous Driving via Online Map Uncertainty

Add code
Apr 17, 2025
Viaarxiv icon

Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation

Add code
Mar 17, 2025
Figure 1 for Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Figure 2 for Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Figure 3 for Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Figure 4 for Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Viaarxiv icon

Dream to Drive with Predictive Individual World Model

Add code
Jan 28, 2025
Figure 1 for Dream to Drive with Predictive Individual World Model
Figure 2 for Dream to Drive with Predictive Individual World Model
Figure 3 for Dream to Drive with Predictive Individual World Model
Figure 4 for Dream to Drive with Predictive Individual World Model
Viaarxiv icon

Online Preference-based Reinforcement Learning with Self-augmented Feedback from Large Language Model

Add code
Dec 22, 2024
Viaarxiv icon