Picture for Rami Katan

Rami Katan

PACIFIC: a framework for generating benchmarks to check Precise Automatically Checked Instruction Following In Code

Add code
Dec 22, 2025
Figure 1 for PACIFIC: a framework for generating benchmarks to check Precise Automatically Checked Instruction Following In Code
Figure 2 for PACIFIC: a framework for generating benchmarks to check Precise Automatically Checked Instruction Following In Code
Figure 3 for PACIFIC: a framework for generating benchmarks to check Precise Automatically Checked Instruction Following In Code
Figure 4 for PACIFIC: a framework for generating benchmarks to check Precise Automatically Checked Instruction Following In Code
Viaarxiv icon

Beyond Blind Spots: Analytic Hints for Mitigating LLM-Based Evaluation Pitfalls

Add code
Dec 18, 2025
Figure 1 for Beyond Blind Spots: Analytic Hints for Mitigating LLM-Based Evaluation Pitfalls
Figure 2 for Beyond Blind Spots: Analytic Hints for Mitigating LLM-Based Evaluation Pitfalls
Figure 3 for Beyond Blind Spots: Analytic Hints for Mitigating LLM-Based Evaluation Pitfalls
Figure 4 for Beyond Blind Spots: Analytic Hints for Mitigating LLM-Based Evaluation Pitfalls
Viaarxiv icon

Vintage Code, Modern Judges: Meta-Validation in Low Data Regimes

Add code
Oct 31, 2025
Viaarxiv icon

Using Combinatorial Optimization to Design a High quality LLM Solution

Add code
May 15, 2024
Viaarxiv icon