Picture for Bolian Li

Bolian Li

Why Reasoning Fails to Plan: A Planning-Centric Analysis of Long-Horizon Decision Making in LLM Agents

Add code
Jan 29, 2026
Viaarxiv icon

Reward-Shifted Speculative Sampling Is An Efficient Test-Time Weak-to-Strong Aligner

Add code
Aug 20, 2025
Viaarxiv icon

Stacey: Promoting Stochastic Steepest Descent via Accelerated $\ell_p$-Smooth Nonconvex Optimization

Add code
Jun 07, 2025
Viaarxiv icon

More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment

Add code
Apr 03, 2025
Figure 1 for More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment
Figure 2 for More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment
Figure 3 for More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment
Figure 4 for More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment
Viaarxiv icon

Bayesian Computation in Deep Learning

Add code
Feb 26, 2025
Viaarxiv icon

Making Reliable and Flexible Decisions in Long-tailed Classification

Add code
Jan 23, 2025
Viaarxiv icon

ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time

Add code
Oct 09, 2024
Viaarxiv icon

Cascade Reward Sampling for Efficient Decoding-Time Alignment

Add code
Jun 24, 2024
Figure 1 for Cascade Reward Sampling for Efficient Decoding-Time Alignment
Figure 2 for Cascade Reward Sampling for Efficient Decoding-Time Alignment
Figure 3 for Cascade Reward Sampling for Efficient Decoding-Time Alignment
Figure 4 for Cascade Reward Sampling for Efficient Decoding-Time Alignment
Viaarxiv icon

Entropy-MCMC: Sampling from Flat Basins with Ease

Add code
Oct 09, 2023
Viaarxiv icon

Long-tailed Classification from a Bayesian-decision-theory Perspective

Add code
Mar 21, 2023
Viaarxiv icon