Picture for Sunghwan Kim

Sunghwan Kim

ToolHaystack: Stress-Testing Tool-Augmented Language Models in Realistic Long-Term Interactions

Add code
May 29, 2025
Viaarxiv icon

LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study

Add code
May 26, 2025
Viaarxiv icon

Embodied Agents Meet Personalization: Exploring Memory Utilization for Personalized Assistance

Add code
May 22, 2025
Viaarxiv icon

Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

Add code
May 21, 2025
Viaarxiv icon

Integration of TinyML and LargeML: A Survey of 6G and Beyond

Add code
May 20, 2025
Viaarxiv icon

Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization

Add code
May 19, 2025
Viaarxiv icon

MISO: Multiresolution Submap Optimization for Efficient Globally Consistent Neural Implicit Reconstruction

Add code
Apr 27, 2025
Viaarxiv icon

Stop Playing the Guessing Game! Target-free User Simulation for Evaluating Conversational Recommender Systems

Add code
Nov 25, 2024
Viaarxiv icon

Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

Add code
Oct 17, 2024
Viaarxiv icon

Evaluating Robustness of Reward Models for Mathematical Reasoning

Add code
Oct 02, 2024
Figure 1 for Evaluating Robustness of Reward Models for Mathematical Reasoning
Figure 2 for Evaluating Robustness of Reward Models for Mathematical Reasoning
Figure 3 for Evaluating Robustness of Reward Models for Mathematical Reasoning
Figure 4 for Evaluating Robustness of Reward Models for Mathematical Reasoning
Viaarxiv icon