Picture for Matei Zaharia

Matei Zaharia

The Price Reversal Phenomenon: When Cheaper Reasoning Models End Up Costing More

Add code
Mar 25, 2026
Viaarxiv icon

OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning

Add code
Mar 09, 2026
Viaarxiv icon

AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization

Add code
Feb 23, 2026
Viaarxiv icon

Let the Barbarians In: How AI Can Accelerate Systems Performance Research

Add code
Dec 22, 2025
Viaarxiv icon

DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis

Add code
Aug 27, 2025
Figure 1 for DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis
Figure 2 for DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis
Figure 3 for DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis
Figure 4 for DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis
Viaarxiv icon

Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs

Add code
Aug 06, 2025
Viaarxiv icon

Establishing Best Practices for Building Rigorous Agentic Benchmarks

Add code
Jul 03, 2025
Figure 1 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Figure 2 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Figure 3 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Figure 4 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Viaarxiv icon

LEANN: A Low-Storage Vector Index

Add code
Jun 09, 2025
Viaarxiv icon

EXP-Bench: Can AI Conduct AI Research Experiments?

Add code
May 30, 2025
Figure 1 for EXP-Bench: Can AI Conduct AI Research Experiments?
Figure 2 for EXP-Bench: Can AI Conduct AI Research Experiments?
Figure 3 for EXP-Bench: Can AI Conduct AI Research Experiments?
Figure 4 for EXP-Bench: Can AI Conduct AI Research Experiments?
Viaarxiv icon

ColBERT-serve: Efficient Multi-Stage Memory-Mapped Scoring

Add code
Apr 21, 2025
Viaarxiv icon