Picture for Jasper Dekoninck

Jasper Dekoninck

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs

Add code
Oct 06, 2025
Figure 1 for BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Figure 2 for BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Figure 3 for BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Figure 4 for BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Viaarxiv icon

Constrained Decoding of Diffusion LLMs with Context-Free Grammars

Add code
Aug 13, 2025
Viaarxiv icon

MathArena: Evaluating LLMs on Uncontaminated Math Competitions

Add code
May 29, 2025
Figure 1 for MathArena: Evaluating LLMs on Uncontaminated Math Competitions
Figure 2 for MathArena: Evaluating LLMs on Uncontaminated Math Competitions
Figure 3 for MathArena: Evaluating LLMs on Uncontaminated Math Competitions
Figure 4 for MathArena: Evaluating LLMs on Uncontaminated Math Competitions
Viaarxiv icon

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

Add code
Mar 27, 2025
Figure 1 for Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
Figure 2 for Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
Figure 3 for Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
Viaarxiv icon

A Unified Approach to Routing and Cascading for LLMs

Add code
Oct 14, 2024
Figure 1 for A Unified Approach to Routing and Cascading for LLMs
Figure 2 for A Unified Approach to Routing and Cascading for LLMs
Figure 3 for A Unified Approach to Routing and Cascading for LLMs
Figure 4 for A Unified Approach to Routing and Cascading for LLMs
Viaarxiv icon

Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation

Add code
Sep 01, 2024
Figure 1 for Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
Figure 2 for Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
Figure 3 for Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
Figure 4 for Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
Viaarxiv icon

ConStat: Performance-Based Contamination Detection in Large Language Models

Add code
May 25, 2024
Figure 1 for ConStat: Performance-Based Contamination Detection in Large Language Models
Figure 2 for ConStat: Performance-Based Contamination Detection in Large Language Models
Figure 3 for ConStat: Performance-Based Contamination Detection in Large Language Models
Figure 4 for ConStat: Performance-Based Contamination Detection in Large Language Models
Viaarxiv icon

Evading Data Contamination Detection for Language Models is (too) Easy

Add code
Feb 12, 2024
Figure 1 for Evading Data Contamination Detection for Language Models is (too) Easy
Figure 2 for Evading Data Contamination Detection for Language Models is (too) Easy
Figure 3 for Evading Data Contamination Detection for Language Models is (too) Easy
Figure 4 for Evading Data Contamination Detection for Language Models is (too) Easy
Viaarxiv icon

Controlled Text Generation via Language Model Arithmetic

Add code
Nov 24, 2023
Figure 1 for Controlled Text Generation via Language Model Arithmetic
Figure 2 for Controlled Text Generation via Language Model Arithmetic
Figure 3 for Controlled Text Generation via Language Model Arithmetic
Figure 4 for Controlled Text Generation via Language Model Arithmetic
Viaarxiv icon