Topic


MAKIEval: A Multilingual Automatic WiKidata-based Framework for Cultural Awareness Evaluation for LLMs

Add code
May 27, 2025
Viaarxiv icon

Improving Research Idea Generation Through Data: An Empirical Investigation in Social Science

Add code
May 27, 2025
Viaarxiv icon

Def-DTS: Deductive Reasoning for Open-domain Dialogue Topic Segmentation

Add code
May 27, 2025
Viaarxiv icon

Public Discourse Sandbox: Facilitating Human and AI Digital Communication Research

Add code
May 27, 2025
Viaarxiv icon

The Many Challenges of Human-Like Agents in Virtual Game Environments

Add code
May 26, 2025
Viaarxiv icon

What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs

Add code
May 26, 2025
Viaarxiv icon

Discovering Forbidden Topics in Language Models

Add code
May 26, 2025
Viaarxiv icon

Self-Reflective Planning with Knowledge Graphs: Enhancing LLM Reasoning Reliability for Question Answering

Add code
May 26, 2025
Viaarxiv icon

Inverse Q-Learning Done Right: Offline Imitation Learning in $Q^π$-Realizable MDPs

Add code
May 26, 2025
Viaarxiv icon

MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research

Add code
May 26, 2025
Viaarxiv icon