Picture for Siyu Yuan

Siyu Yuan

Curse of Knowledge: When Complex Evaluation Context Benefits yet Biases LLM Judges

Add code
Sep 03, 2025
Viaarxiv icon

ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Add code
Aug 26, 2025
Viaarxiv icon

Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles

Add code
May 26, 2025
Viaarxiv icon

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

Add code
May 21, 2025
Viaarxiv icon

The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement

Add code
Mar 20, 2025
Viaarxiv icon

Implicit Reasoning in Transformers is Reasoning through Shortcuts

Add code
Mar 10, 2025
Viaarxiv icon

Hybrid CNN-Dilated Self-attention Model Using Inertial and Body-Area Electrostatic Sensing for Gym Workout Recognition, Counting, and User Authentification

Add code
Mar 08, 2025
Viaarxiv icon

CoSER: Coordinating LLM-Based Persona Simulation of Established Roles

Add code
Feb 13, 2025
Viaarxiv icon

Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

Add code
Jan 20, 2025
Viaarxiv icon

ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use

Add code
Jan 07, 2025
Viaarxiv icon