Picture for Hyungjoo Chae

Hyungjoo Chae

Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression

Add code
May 20, 2026
Viaarxiv icon

Safe and Scalable Web Agent Learning via Recreated Websites

Add code
Mar 11, 2026
Viaarxiv icon

ToolHaystack: Stress-Testing Tool-Augmented Language Models in Realistic Long-Term Interactions

Add code
May 29, 2025
Viaarxiv icon

Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

Add code
May 21, 2025
Figure 1 for Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Figure 2 for Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Figure 3 for Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Figure 4 for Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Viaarxiv icon

Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization

Add code
May 19, 2025
Viaarxiv icon

Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

Add code
Oct 17, 2024
Figure 1 for Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Figure 2 for Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Figure 3 for Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Figure 4 for Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Viaarxiv icon

Evaluating Robustness of Reward Models for Mathematical Reasoning

Add code
Oct 02, 2024
Figure 1 for Evaluating Robustness of Reward Models for Mathematical Reasoning
Figure 2 for Evaluating Robustness of Reward Models for Mathematical Reasoning
Figure 3 for Evaluating Robustness of Reward Models for Mathematical Reasoning
Figure 4 for Evaluating Robustness of Reward Models for Mathematical Reasoning
Viaarxiv icon

Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code

Add code
Sep 29, 2024
Figure 1 for Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code
Figure 2 for Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code
Figure 3 for Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code
Figure 4 for Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code
Viaarxiv icon

Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

Add code
Jun 20, 2024
Figure 1 for Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Figure 2 for Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Figure 3 for Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Figure 4 for Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Viaarxiv icon

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Add code
Jun 09, 2024
Figure 1 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 2 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 3 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 4 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Viaarxiv icon