Picture for Guijin Son

Guijin Son

Controlling Language Confusion in Multilingual LLMs

Add code
May 25, 2025
Viaarxiv icon

When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research

Add code
May 17, 2025
Viaarxiv icon

On the Robustness of Reward Models for Language Model Alignment

Add code
May 12, 2025
Viaarxiv icon

HRET: A Self-Evolving LLM Evaluation Toolkit for Korean

Add code
Apr 01, 2025
Viaarxiv icon

Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning

Add code
Feb 24, 2025
Viaarxiv icon

Multi-Step Reasoning in Korean and the Emergent Mirage

Add code
Jan 10, 2025
Viaarxiv icon

Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap

Add code
Jan 05, 2025
Figure 1 for Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap
Figure 2 for Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap
Figure 3 for Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap
Figure 4 for Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap
Viaarxiv icon

Improving Fine-grained Visual Understanding in VLMs through Text-Only Training

Add code
Dec 17, 2024
Viaarxiv icon

MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models

Add code
Oct 23, 2024
Figure 1 for MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
Figure 2 for MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
Figure 3 for MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
Figure 4 for MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
Viaarxiv icon

LLM-as-a-Judge & Reward Model: What They Can and Cannot Do

Add code
Sep 17, 2024
Viaarxiv icon