Picture for Yangqiu Song

Yangqiu Song

NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents

Add code
Oct 08, 2025
Viaarxiv icon

The Cognitive Bandwidth Bottleneck: Shifting Long-Horizon Agent from Planning with Actions to Planning with Schemas

Add code
Oct 08, 2025
Viaarxiv icon

LLM-Hanabi: Evaluating Multi-Agent Gameplays with Theory-of-Mind and Rationale Inference in Imperfect Information Collaboration Game

Add code
Oct 06, 2025
Viaarxiv icon

Safety Compliance: Rethinking LLM Safety Reasoning through the Lens of Compliance

Add code
Sep 26, 2025
Viaarxiv icon

Structuring the Unstructured: A Systematic Review of Text-to-Structure Generation for Agentic AI with a Universal Evaluation Framework

Add code
Aug 17, 2025
Viaarxiv icon

Prospect Theory Fails for LLMs: Revealing Instability of Decision-Making under Epistemic Uncertainty

Add code
Aug 12, 2025
Figure 1 for Prospect Theory Fails for LLMs: Revealing Instability of Decision-Making under Epistemic Uncertainty
Figure 2 for Prospect Theory Fails for LLMs: Revealing Instability of Decision-Making under Epistemic Uncertainty
Figure 3 for Prospect Theory Fails for LLMs: Revealing Instability of Decision-Making under Epistemic Uncertainty
Figure 4 for Prospect Theory Fails for LLMs: Revealing Instability of Decision-Making under Epistemic Uncertainty
Viaarxiv icon

SessionIntentBench: A Multi-task Inter-session Intention-shift Modeling Benchmark for E-commerce Customer Behavior Understanding

Add code
Jul 27, 2025
Viaarxiv icon

From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents

Add code
Jun 23, 2025
Viaarxiv icon

Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?

Add code
May 30, 2025
Figure 1 for Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?
Figure 2 for Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?
Figure 3 for Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?
Figure 4 for Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?
Viaarxiv icon

AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora

Add code
May 29, 2025
Viaarxiv icon