Picture for Siyu Yuan

Siyu Yuan

Curse of Knowledge: When Complex Evaluation Context Benefits yet Biases LLM Judges

Add code
Sep 03, 2025
Viaarxiv icon

ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Add code
Aug 26, 2025
Figure 1 for ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models
Figure 2 for ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models
Figure 3 for ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models
Figure 4 for ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models
Viaarxiv icon

Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles

Add code
May 26, 2025
Figure 1 for Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Figure 2 for Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Figure 3 for Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Figure 4 for Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Viaarxiv icon

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

Add code
May 21, 2025
Viaarxiv icon

The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement

Add code
Mar 20, 2025
Viaarxiv icon

Implicit Reasoning in Transformers is Reasoning through Shortcuts

Add code
Mar 10, 2025
Viaarxiv icon

Hybrid CNN-Dilated Self-attention Model Using Inertial and Body-Area Electrostatic Sensing for Gym Workout Recognition, Counting, and User Authentification

Add code
Mar 08, 2025
Viaarxiv icon

CoSER: Coordinating LLM-Based Persona Simulation of Established Roles

Add code
Feb 13, 2025
Viaarxiv icon

Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

Add code
Jan 20, 2025
Viaarxiv icon

ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use

Add code
Jan 07, 2025
Viaarxiv icon