Picture for Martin Vechev

Martin Vechev

CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

Add code
Mar 04, 2026
Viaarxiv icon

Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets

Add code
Feb 25, 2026
Viaarxiv icon

Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

Add code
Feb 12, 2026
Viaarxiv icon

A Unified Framework for LLM Watermarks

Add code
Feb 06, 2026
Viaarxiv icon

Learning Compact Boolean Networks

Add code
Feb 05, 2026
Viaarxiv icon

AutoBaxBuilder: Bootstrapping Code Security Benchmarking

Add code
Dec 24, 2025
Figure 1 for AutoBaxBuilder: Bootstrapping Code Security Benchmarking
Figure 2 for AutoBaxBuilder: Bootstrapping Code Security Benchmarking
Figure 3 for AutoBaxBuilder: Bootstrapping Code Security Benchmarking
Figure 4 for AutoBaxBuilder: Bootstrapping Code Security Benchmarking
Viaarxiv icon

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

Add code
Oct 09, 2025
Viaarxiv icon

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs

Add code
Oct 06, 2025
Figure 1 for BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Figure 2 for BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Figure 3 for BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Figure 4 for BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Viaarxiv icon

Constrained Decoding of Diffusion LLMs with Context-Free Grammars

Add code
Aug 13, 2025
Viaarxiv icon

MathArena: Evaluating LLMs on Uncontaminated Math Competitions

Add code
May 29, 2025
Figure 1 for MathArena: Evaluating LLMs on Uncontaminated Math Competitions
Figure 2 for MathArena: Evaluating LLMs on Uncontaminated Math Competitions
Figure 3 for MathArena: Evaluating LLMs on Uncontaminated Math Competitions
Figure 4 for MathArena: Evaluating LLMs on Uncontaminated Math Competitions
Viaarxiv icon