Picture for Wenyue Hua

Wenyue Hua

Epistemic Context Learning: Building Trust the Right Way in LLM-Based Multi-Agent Systems

Add code
Jan 29, 2026
Viaarxiv icon

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Add code
Jul 28, 2025
Figure 1 for A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence
Figure 2 for A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence
Figure 3 for A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence
Figure 4 for A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence
Viaarxiv icon

MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation

Add code
Jun 25, 2025
Viaarxiv icon

Semantic Scheduling for LLM Inference

Add code
Jun 13, 2025
Viaarxiv icon

THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models

Add code
Apr 17, 2025
Figure 1 for THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
Figure 2 for THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
Figure 3 for THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
Figure 4 for THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models
Viaarxiv icon

REALM: A Dataset of Real-World LLM Use Cases

Add code
Mar 24, 2025
Viaarxiv icon

AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence

Add code
Mar 11, 2025
Figure 1 for AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence
Figure 2 for AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence
Figure 3 for AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence
Figure 4 for AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence
Viaarxiv icon

InductionBench: LLMs Fail in the Simplest Complexity Class

Add code
Feb 26, 2025
Viaarxiv icon

Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents

Add code
Feb 18, 2025
Figure 1 for Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents
Figure 2 for Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents
Figure 3 for Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents
Figure 4 for Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents
Viaarxiv icon

ADO: Automatic Data Optimization for Inputs in LLM Prompts

Add code
Feb 17, 2025
Viaarxiv icon