Picture for Zhang-Wei Hong

Zhang-Wei Hong

BOAD: Discovering Hierarchical Software Engineering Agents via Bandit Optimization

Add code
Dec 29, 2025
Viaarxiv icon

Tailored Primitive Initialization is the Secret Key to Reinforcement Learning

Add code
Nov 16, 2025
Figure 1 for Tailored Primitive Initialization is the Secret Key to Reinforcement Learning
Figure 2 for Tailored Primitive Initialization is the Secret Key to Reinforcement Learning
Figure 3 for Tailored Primitive Initialization is the Secret Key to Reinforcement Learning
Figure 4 for Tailored Primitive Initialization is the Secret Key to Reinforcement Learning
Viaarxiv icon

ReGen: Generative Robot Simulation via Inverse Design

Add code
Nov 06, 2025
Viaarxiv icon

Your Reward Function for RL is Your Best PRM for Search: Unifying RL and Search-Based TTS

Add code
Aug 19, 2025
Viaarxiv icon

Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering

Add code
May 29, 2025
Figure 1 for Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Figure 2 for Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Figure 3 for Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Figure 4 for Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
Viaarxiv icon

RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning

Add code
May 21, 2025
Viaarxiv icon

Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search

Add code
Feb 04, 2025
Figure 1 for Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Figure 2 for Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Figure 3 for Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Figure 4 for Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Viaarxiv icon

Embodied Red Teaming for Auditing Robotic Foundation Models

Add code
Nov 27, 2024
Figure 1 for Embodied Red Teaming for Auditing Robotic Foundation Models
Figure 2 for Embodied Red Teaming for Auditing Robotic Foundation Models
Figure 3 for Embodied Red Teaming for Auditing Robotic Foundation Models
Figure 4 for Embodied Red Teaming for Auditing Robotic Foundation Models
Viaarxiv icon

ImageNet-RIB Benchmark: Large Pre-Training Datasets Don't Guarantee Robustness after Fine-Tuning

Add code
Oct 28, 2024
Figure 1 for ImageNet-RIB Benchmark: Large Pre-Training Datasets Don't Guarantee Robustness after Fine-Tuning
Figure 2 for ImageNet-RIB Benchmark: Large Pre-Training Datasets Don't Guarantee Robustness after Fine-Tuning
Figure 3 for ImageNet-RIB Benchmark: Large Pre-Training Datasets Don't Guarantee Robustness after Fine-Tuning
Figure 4 for ImageNet-RIB Benchmark: Large Pre-Training Datasets Don't Guarantee Robustness after Fine-Tuning
Viaarxiv icon

ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization

Add code
Oct 17, 2024
Figure 1 for ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
Figure 2 for ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
Figure 3 for ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
Figure 4 for ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
Viaarxiv icon