Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniel Burkhardt

The Red Queen Gödel Machine: Co-Evolving Agents and Their Evaluators

Jun 24, 2026

Alex Iacob, Andrej Jovanović, William F. Shen, Daniel Burkhardt, Meghdad Kurmanji, Nurbek Tastan, Lorenzo Sani, Niccolò Alberto Elia Venanzi, Ambroise Odonnat, Zeyu Cao(+3 more)

Abstract:Self-improving agents are state-of-the-art (SOTA) on agentic coding benchmarks and have recently been extended to general domains. However, their search methods generally assume a stationary evaluation criterion: a fixed verifier, benchmark, or labeled dataset that remains valid as the agent improves. This ignores a central feature of evolution: species adapt as their environments change with them. We aim to bring the same principle to recursive self-improvement, making evaluation part of the improvement loop and opening search to evolving evaluators, adversarial objectives, and dynamic utilities that may surpass static benchmarks. We introduce the Red Queen Godel Machine (RQGM), an evolutionary framework for recursive self-improvement under non-stationary utilities. The RQGM makes this possible through controlled utility evolution: search is organized into epochs with a fixed within-epoch evaluation criterion, while the utility can be updated at epoch boundaries, so self-improvement guarantees hold per epoch as the objective evolves across them. We begin by showing that even on verifiable coding tasks, the RQGM improves test pass rate over the prior SOTA by adding a complementary agent-as-a-judge code-review signal. This signal is cheaper and the RQGM uses 1.35x-1.72x fewer tokens. We then turn to scientific paper writing and reviewing, and Olympiad-level proof writing and grading, where the RQGM improves performance over prior self-improving agents: co-evolved writers reach 1.78x-1.86x higher acceptance rates under a diverse agent-as-a-judge panel, while co-evolved graders reach 9% higher ground-truth accuracy. In paper reviewing, the strongest baseline reviewer over-accepts AI-generated papers at up to 1.91x the human rate. The RQGM corrects this by introducing an adversarial objective that discovers reviewers equally stringent on AI and human work.

* 12 pages main text + 21 pages appendix (37 pages total, incl. references); 10 figures (6 main text + 4 appendix); 10 tables (2 main text + 8 appendix). Preliminary preprint; work in progress. Keywords: self-improving agents, learned evaluation, multi-agent systems, auto- mated scientific discovery, controlled utility evolution, co-evolutionary search, autoresearch

Via

Access Paper or Ask Questions

Finding Archetypal Spaces for Data Using Neural Networks

Jan 25, 2019

David van Dijk, Daniel Burkhardt, Matthew Amodio, Alex Tong, Guy Wolf, Smita Krishnaswamy

Figure 1 for Finding Archetypal Spaces for Data Using Neural Networks

Figure 2 for Finding Archetypal Spaces for Data Using Neural Networks

Figure 3 for Finding Archetypal Spaces for Data Using Neural Networks

Figure 4 for Finding Archetypal Spaces for Data Using Neural Networks

Abstract:Archetypal analysis is a type of factor analysis where data is fit by a convex polytope whose corners are "archetypes" of the data, with the data represented as a convex combination of these archetypal points. While archetypal analysis has been used on biological data, it has not achieved widespread adoption because most data are not well fit by a convex polytope in either the ambient space or after standard data transformations. We propose a new approach to archetypal analysis. Instead of fitting a convex polytope directly on data or after a specific data transformation, we train a neural network (AAnet) to learn a transformation under which the data can best fit into a polytope. We validate this approach on synthetic data where we add nonlinearity. Here, AAnet is the only method that correctly identifies the archetypes. We also demonstrate AAnet on two biological datasets. In a T cell dataset measured with single cell RNA-sequencing, AAnet identifies several archetypal states corresponding to naive, memory, and cytotoxic T cells. In a dataset of gut microbiome profiles, AAnet recovers both previously described microbiome states and identifies novel extrema in the data. Finally, we show that AAnet has generative properties allowing us to uniformly sample from the data geometry even when the input data is not uniformly distributed.

* 8 pages, 11 figures, submitted to ICML2019

Via

Access Paper or Ask Questions