Picture for Wenyue Hua

Wenyue Hua

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Add code
Jul 28, 2025
Viaarxiv icon

MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation

Add code
Jun 25, 2025
Viaarxiv icon

Semantic Scheduling for LLM Inference

Add code
Jun 13, 2025
Viaarxiv icon

THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models

Add code
Apr 17, 2025
Viaarxiv icon

REALM: A Dataset of Real-World LLM Use Cases

Add code
Mar 24, 2025
Viaarxiv icon

AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence

Add code
Mar 11, 2025
Figure 1 for AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence
Figure 2 for AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence
Figure 3 for AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence
Figure 4 for AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence
Viaarxiv icon

InductionBench: LLMs Fail in the Simplest Complexity Class

Add code
Feb 26, 2025
Viaarxiv icon

Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents

Add code
Feb 18, 2025
Figure 1 for Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents
Figure 2 for Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents
Figure 3 for Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents
Figure 4 for Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents
Viaarxiv icon

ADO: Automatic Data Optimization for Inputs in LLM Prompts

Add code
Feb 17, 2025
Viaarxiv icon

Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense

Add code
Jan 05, 2025
Viaarxiv icon