Picture for Ruixi Qiao

Ruixi Qiao

Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning

Add code
Apr 21, 2025
Figure 1 for Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Figure 2 for Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Figure 3 for Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Figure 4 for Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Viaarxiv icon

Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining

Add code
Oct 01, 2024
Figure 1 for Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Figure 2 for Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Figure 3 for Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Figure 4 for Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Viaarxiv icon