Picture for Alice Oh

Alice Oh

KAIST

One-Topic-Doesn't-Fit-All: Transcreating Reading Comprehension Test for Personalized Learning

Add code
Nov 12, 2025
Viaarxiv icon

Are they lovers or friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues

Add code
Oct 21, 2025
Viaarxiv icon

KORMo: Korean Open Reasoning Model for Everyone

Add code
Oct 10, 2025
Figure 1 for KORMo: Korean Open Reasoning Model for Everyone
Figure 2 for KORMo: Korean Open Reasoning Model for Everyone
Figure 3 for KORMo: Korean Open Reasoning Model for Everyone
Figure 4 for KORMo: Korean Open Reasoning Model for Everyone
Viaarxiv icon

Entangled in Representations: Mechanistic Investigation of Cultural Biases in Large Language Models

Add code
Aug 12, 2025
Viaarxiv icon

On the Effect of Uncertainty on Layer-wise Inference Dynamics

Add code
Jul 09, 2025
Figure 1 for On the Effect of Uncertainty on Layer-wise Inference Dynamics
Figure 2 for On the Effect of Uncertainty on Layer-wise Inference Dynamics
Figure 3 for On the Effect of Uncertainty on Layer-wise Inference Dynamics
Figure 4 for On the Effect of Uncertainty on Layer-wise Inference Dynamics
Viaarxiv icon

Spotting Out-of-Character Behavior: Atomic-Level Evaluation of Persona Fidelity in Open-Ended Generation

Add code
Jun 24, 2025
Viaarxiv icon

Flex-TravelPlanner: A Benchmark for Flexible Planning with Language Agents

Add code
Jun 05, 2025
Viaarxiv icon

BLUCK: A Benchmark Dataset for Bengali Linguistic Understanding and Cultural Knowledge

Add code
May 27, 2025
Figure 1 for BLUCK: A Benchmark Dataset for Bengali Linguistic Understanding and Cultural Knowledge
Figure 2 for BLUCK: A Benchmark Dataset for Bengali Linguistic Understanding and Cultural Knowledge
Figure 3 for BLUCK: A Benchmark Dataset for Bengali Linguistic Understanding and Cultural Knowledge
Figure 4 for BLUCK: A Benchmark Dataset for Bengali Linguistic Understanding and Cultural Knowledge
Viaarxiv icon

Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties

Add code
May 27, 2025
Viaarxiv icon

MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language

Add code
May 20, 2025
Viaarxiv icon