Api Benchmark


From Pen to Pixel: Translating Hand-Drawn Plots into Graphical APIs via a Novel Benchmark and Efficient Adapter

Add code
Mar 27, 2026
Viaarxiv icon

CADSmith: Multi-Agent CAD Generation with Programmatic Geometric Validation

Add code
Mar 27, 2026
Viaarxiv icon

MobileDev-Bench: A Comprehensive Benchmark for Evaluating Language Models on Mobile Application Development

Add code
Mar 26, 2026
Viaarxiv icon

Learning to Commit: Generating Organic Pull Requests via Online Repository Memory

Add code
Mar 27, 2026
Viaarxiv icon

BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation

Add code
Mar 26, 2026
Viaarxiv icon

GTO Wizard Benchmark

Add code
Mar 24, 2026
Viaarxiv icon

DP^2-VL: Private Photo Dataset Protection by Data Poisoning for Vision-Language Models

Add code
Mar 25, 2026
Viaarxiv icon

Can Small Models Reason About Legal Documents? A Comparative Study

Add code
Mar 26, 2026
Viaarxiv icon

Reasoner-Executor-Synthesizer: Scalable Agentic Architecture with Static O(1) Context Window

Add code
Mar 23, 2026
Viaarxiv icon

GraphRAG for Engineering Diagrams: ChatP&ID Enables LLM Interaction with P&IDs

Add code
Mar 23, 2026
Viaarxiv icon