Picture for Taro Yano

Taro Yano

An Empirical Study of LLM-as-a-Judge: How Design Choices Impact Evaluation Reliability

Add code
Jun 16, 2025
Viaarxiv icon

LaMDAgent: An Autonomous Framework for Post-Training Pipeline Optimization via LLM Agents

Add code
May 28, 2025
Viaarxiv icon

Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning

Add code
May 15, 2025
Viaarxiv icon

Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance

Add code
Apr 28, 2025
Viaarxiv icon

Can Large Language Models Invent Algorithms to Improve Themselves?

Add code
Oct 21, 2024
Viaarxiv icon