Picture for William Yang Wang

William Yang Wang

Dynamic Evaluation for Oversensitivity in LLMs

Add code
Oct 21, 2025
Viaarxiv icon

LEDOM: An Open and Fundamental Reverse Language Model

Add code
Jul 02, 2025
Viaarxiv icon

MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation

Add code
Jun 25, 2025
Viaarxiv icon

Semantic Scheduling for LLM Inference

Add code
Jun 13, 2025
Viaarxiv icon

MacRAG: Compress, Slice, and Scale-up for Multi-Scale Adaptive Context RAG

Add code
May 10, 2025
Viaarxiv icon

THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models

Add code
Apr 17, 2025
Viaarxiv icon

Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning

Add code
Apr 04, 2025
Figure 1 for Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning
Figure 2 for Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning
Figure 3 for Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning
Figure 4 for Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning
Viaarxiv icon

REALM: A Dataset of Real-World LLM Use Cases

Add code
Mar 24, 2025
Viaarxiv icon

AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence

Add code
Mar 11, 2025
Figure 1 for AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence
Figure 2 for AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence
Figure 3 for AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence
Figure 4 for AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence
Viaarxiv icon

InductionBench: LLMs Fail in the Simplest Complexity Class

Add code
Feb 26, 2025
Viaarxiv icon