Picture for Beong-woo Kwak

Beong-woo Kwak

ToolHaystack: Stress-Testing Tool-Augmented Language Models in Realistic Long-Term Interactions

Add code
May 29, 2025
Viaarxiv icon

LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study

Add code
May 26, 2025
Viaarxiv icon

Embodied Agents Meet Personalization: Exploring Memory Utilization for Personalized Assistance

Add code
May 22, 2025
Viaarxiv icon

Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

Add code
May 21, 2025
Viaarxiv icon

Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code

Add code
Sep 29, 2024
Figure 1 for Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code
Figure 2 for Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code
Figure 3 for Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code
Figure 4 for Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code
Viaarxiv icon

Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

Add code
Jun 20, 2024
Figure 1 for Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Figure 2 for Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Figure 3 for Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Figure 4 for Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Viaarxiv icon

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Add code
Apr 03, 2024
Figure 1 for Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Figure 2 for Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Figure 3 for Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Figure 4 for Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Viaarxiv icon

Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset

Add code
Mar 08, 2024
Figure 1 for Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset
Figure 2 for Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset
Figure 3 for Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset
Figure 4 for Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset
Viaarxiv icon

Modularized Transfer Learning with Multiple Knowledge Graphs for Zero-shot Commonsense Reasoning

Add code
Jun 22, 2022
Figure 1 for Modularized Transfer Learning with Multiple Knowledge Graphs for Zero-shot Commonsense Reasoning
Figure 2 for Modularized Transfer Learning with Multiple Knowledge Graphs for Zero-shot Commonsense Reasoning
Figure 3 for Modularized Transfer Learning with Multiple Knowledge Graphs for Zero-shot Commonsense Reasoning
Figure 4 for Modularized Transfer Learning with Multiple Knowledge Graphs for Zero-shot Commonsense Reasoning
Viaarxiv icon

Dual Task Framework for Improving Persona-grounded Dialogue Dataset

Add code
Feb 16, 2022
Figure 1 for Dual Task Framework for Improving Persona-grounded Dialogue Dataset
Figure 2 for Dual Task Framework for Improving Persona-grounded Dialogue Dataset
Figure 3 for Dual Task Framework for Improving Persona-grounded Dialogue Dataset
Figure 4 for Dual Task Framework for Improving Persona-grounded Dialogue Dataset
Viaarxiv icon