Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Julien Vanegue

LLMs versus the Halting Problem: Revisiting Program Termination Prediction

Jan 26, 2026

Oren Sultan, Jordi Armengol-Estape, Pascal Kesseli, Julien Vanegue, Dafna Shahaf, Yossi Adi, Peter O'Hearn

Abstract:Determining whether a program terminates is a central problem in computer science. Turing's foundational result established the Halting Problem as undecidable, showing that no algorithm can universally determine termination for all programs and inputs. Consequently, automatic verification tools approximate termination, sometimes failing to prove or disprove; these tools rely on problem-specific architectures and abstractions, and are usually tied to particular programming languages. Recent success and progress in large language models (LLMs) raises the following question: can LLMs reliably predict program termination? In this work, we evaluate LLMs on a diverse set of C programs from the Termination category of the International Competition on Software Verification (SV-Comp) 2025. Our results suggest that LLMs perform remarkably well at predicting program termination, where GPT-5 and Claude Sonnet-4.5 would rank just behind the top-ranked tool (using test-time-scaling), and Code World Model (CWM) would place just behind the second-ranked tool. While LLMs are effective at predicting program termination, they often fail to provide a valid witness as a proof. Moreover, LLMs performance drops as program length increases. We hope these insights motivate further research into program termination and the broader potential of LLMs for reasoning about undecidable problems.

Via

Access Paper or Ask Questions

Non-Termination Proving: 100 Million LoC and Beyond

Sep 05, 2025

Julien Vanegue, Jules Villard, Peter O'Hearn, Azalea Raad

Abstract:We report on our tool, Pulse Infinite, that uses proof techniques to show non-termination (divergence) in large programs. Pulse Infinite works compositionally and under-approximately: the former supports scale, and the latter ensures soundness for proving divergence. Prior work focused on small benchmarks in the tens or hundreds of lines of code (LoC), and scale limits their practicality: a single company may have tens of millions, or even hundreds of millions of LoC or more. We report on applying Pulse Infinite to over a hundred million lines of open-source and proprietary software written in C, C++, and Hack, identifying over 30 previously unknown issues, establishing a new state of the art for detecting divergence in real-world codebases.

* 14 pages, 4 figures

Via

Access Paper or Ask Questions