Code Benchmark


SWE-PRBench: Benchmarking AI Code Review Quality Against Pull Request Feedback

Add code
Mar 27, 2026
Viaarxiv icon

Cinematic Audio Source Separation Using Visual Cues

Add code
Mar 27, 2026
Viaarxiv icon

Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering

Add code
Mar 27, 2026
Viaarxiv icon

A Human-Inspired Decoupled Architecture for Efficient Audio Representation Learning

Add code
Mar 27, 2026
Viaarxiv icon

Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification

Add code
Mar 27, 2026
Viaarxiv icon

ATime-Consistent Benchmark for Repository-Level Software Engineering Evaluation

Add code
Mar 27, 2026
Viaarxiv icon

Zero-Shot Depth from Defocus

Add code
Mar 27, 2026
Viaarxiv icon

ReCUBE: Evaluating Repository-Level Context Utilization in Code Generation

Add code
Mar 26, 2026
Viaarxiv icon

LLM Benchmark-User Need Misalignment for Climate Change

Add code
Mar 27, 2026
Viaarxiv icon

Learning to Commit: Generating Organic Pull Requests via Online Repository Memory

Add code
Mar 27, 2026
Viaarxiv icon