Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Viola Campos

Multicalibration for LLM-based Code Generation

Dec 09, 2025

Viola Campos, Robin Kuschnereit, Adrian Ulges

Abstract:As AI-based code generation becomes widespread, researchers are investigating the calibration of code LLMs - ensuring their confidence scores faithfully represent the true likelihood of code correctness. To do so, we investigate multicalibration, which can capture additional factors about a coding problem, such as complexity, code length, or programming language used. We study four multicalibration approaches on three function synthesis benchmarks, using latest-generation code LLMs (Qwen3 Coder, GPT-OSS, DeepSeek-R1-Distill). Our results demonstrate that multicalibration can yield distinct improvements over both uncalibrated token likelihoods (+1.03 in skill score) and baseline calibrations (+0.37 in skill score). We study the influence of the aforementioned factors in ablations, and make our dataset (consisting of code generations, likelihoods, and correctness labels) available for future research on code LLM calibration.

* Accepted at AI-SQE 2026 (The 1st International Workshop on AI for Software Quality Evaluation: Judgment, Metrics, Benchmarks, and Beyond)

Via

Access Paper or Ask Questions

Addressing Leakage in Self-Supervised Contextualized Code Retrieval

Apr 17, 2022

Johannes Villmow, Viola Campos, Adrian Ulges, Ulrich Schwanecke

Figure 1 for Addressing Leakage in Self-Supervised Contextualized Code Retrieval

Figure 2 for Addressing Leakage in Self-Supervised Contextualized Code Retrieval

Figure 3 for Addressing Leakage in Self-Supervised Contextualized Code Retrieval

Figure 4 for Addressing Leakage in Self-Supervised Contextualized Code Retrieval

Abstract:We address contextualized code retrieval, the search for code snippets helpful to fill gaps in a partial input program. Our approach facilitates a large-scale self-supervised contrastive training by splitting source code randomly into contexts and targets. To combat leakage between the two, we suggest a novel approach based on mutual identifier masking, dedentation, and the selection of syntax-aligned targets. Our second contribution is a new dataset for direct evaluation of contextualized code retrieval, based on a dataset of manually aligned subpassages of code clones. Our experiments demonstrate that our approach improves retrieval substantially, and yields new state-of-the-art results for code clone and defect detection.

* 4 pages, 5 figures

Via

Access Paper or Ask Questions