Picture for William Yang Wang

William Yang Wang

MacRAG: Compress, Slice, and Scale-up for Multi-Scale Adaptive Context RAG

Add code
May 10, 2025
Viaarxiv icon

THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models

Add code
Apr 17, 2025
Viaarxiv icon

Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning

Add code
Apr 04, 2025
Viaarxiv icon

REALM: A Dataset of Real-World LLM Use Cases

Add code
Mar 24, 2025
Viaarxiv icon

AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence

Add code
Mar 11, 2025
Viaarxiv icon

InductionBench: LLMs Fail in the Simplest Complexity Class

Add code
Feb 26, 2025
Viaarxiv icon

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Add code
Feb 20, 2025
Figure 1 for MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Figure 2 for MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Figure 3 for MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Figure 4 for MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Viaarxiv icon

MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison

Add code
Feb 07, 2025
Figure 1 for MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison
Figure 2 for MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison
Figure 3 for MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison
Figure 4 for MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison
Viaarxiv icon

BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations

Add code
Jan 13, 2025
Viaarxiv icon

Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework

Add code
Dec 22, 2024
Viaarxiv icon