Picture for Eunsu Kim

Eunsu Kim

BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages

Add code
Jun 14, 2024
Viaarxiv icon

CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean

Add code
Mar 15, 2024
Viaarxiv icon

Multi-FAct: Assessing Multilingual LLMs' Multi-Regional Knowledge using FActScore

Add code
Mar 01, 2024
Viaarxiv icon

The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate

Add code
Feb 09, 2024
Viaarxiv icon