Picture for Jingwen Gu

Jingwen Gu

Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation

Add code
Oct 16, 2025
Viaarxiv icon

ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs

Add code
Jun 23, 2025
Figure 1 for ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
Figure 2 for ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
Figure 3 for ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
Figure 4 for ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
Viaarxiv icon

Orchestrating LLMs with Different Personalizations

Add code
Jul 04, 2024
Figure 1 for Orchestrating LLMs with Different Personalizations
Figure 2 for Orchestrating LLMs with Different Personalizations
Figure 3 for Orchestrating LLMs with Different Personalizations
Figure 4 for Orchestrating LLMs with Different Personalizations
Viaarxiv icon