Abstract:In this work, we introduce a Tropical Axial Attention neural reasoning architecture that replaces vanilla softmax dot-product attention with max-plus operators, inducing a piecewise-linear structure aligned with dynamic programming formulations. From multi-species sequence alignments, our model learns all possible pairwise distances and is trained using a combination of $\ell_1$ and tropical symmetric distance metric losses with an ultrametric violation penalty. We leverage the well known isomorphic relationship between the space of all phylogenetic trees with $n$ species and tropical Grassmannian to show that tropical attention provides a natural geometric framework for phylogenetic inference. On empirical $DS1-DS11$ alignments, where true trees are unknown, the tropical model produces distance matrices that are substantially closer to their BME-induced tree metrics than the baseline models. These results suggest that tropical attention is a useful geometric inductive bias for neural phylogenetic inference, especially under distribution shift and when tree-metric consistency is important.
Abstract:We introduce a new strategy for compositional neural surrogates for radiation-matter interactions, a key task spanning domains from particle physics through nuclear and space engineering to medical physics. Exploiting the locality and the Markov nature of particle interactions, we create a \emph{next-particle prediction} kernel using hybrid discrete-continuous transformer models based on Riemannian Flow Matching on product manifolds. The model generates variable-sized typed sets of particles and radiation side effects that are the result of the interaction of an incident particle with a material volume. The resulting kernel can be composed to simulate unseen large-scale material distributions in a zero-shot manner. Unlike mechanistic simulators, our model is designed to be differentiable, provides tractable likelihoods for future downstream applications. A significant computational speed-up on GPU compared to CPU-bound mechanistic simulation is observed for single-kernel execution. We evaluate the model at the kernel level and demonstrate predictive stability over multi-round autoregressive rollouts. We additionally release a novel 20M-event radiation-matter interaction dataset for further research.
Abstract:Using FlowBoost, a closed-loop deep generative optimization framework for extremal structure discovery, we investigate $\ell^p$-generalizations of the finite free Stam inequality for real-rooted polynomials under finite free additive convolution $\boxplus_n$. At $p=2$, FlowBoost finds the Hermite pair as the unique equality case and reveals the spectral structure of the linearized convolution map at this extremal point. As a result, we conjecture that the singular values of the doubly stochastic coupling matrix $E_n$ on the mean-zero subspace are ${2^{-k/2}:k=1,\ldots,n-1}$, independent of $n$. Conditional on this conjecture, we obtain a sharp local stability constant and the finite free CLT convergence rate, both uniform in $n$. We introduce a one-parameter family of $p$-Stam inequalities using $\ell^p$-Fisher information and prove that the Hermite pair itself violates the inequality for every $p>2$, with the sign of the deficit governed by the $\ell^p$-contraction ratio of $E_n$. Systematic computation via FlowBoost supports the conjecture that $p^*\!=2$ is the sharp critical exponent. For $p<2$, the extremal configurations undergo a bifurcation, meaning that they become non-matching pairs with bimodal root structure, converging back to the Hermite diagonal only as $p\to 2^-$. Our findings demonstrate that FlowBoost, can be an effective tool of mathematical discovery in infinite-dimensional extremal problems.
Abstract:The discovery of extremal structures in mathematics requires navigating vast and nonconvex landscapes where analytical methods offer little guidance and brute-force search becomes intractable. We introduce FlowBoost, a closed-loop generative framework that learns to discover rare and extremal geometric structures by combining three components: (i) a geometry-aware conditional flow-matching model that learns to sample high-quality configurations, (ii) reward-guided policy optimization with action exploration that directly optimizes the generation process toward the objective while maintaining diversity, and (iii) stochastic local search for both training-data generation and final refinement. Unlike prior open-loop approaches, such as PatternBoost that retrains on filtered discrete samples, or AlphaEvolve which relies on frozen Large Language Models (LLMs) as evolutionary mutation operators, FlowBoost enforces geometric feasibility during sampling, and propagates reward signal directly into the generative model, closing the optimization loop and requiring much smaller training sets and shorter training times, and reducing the required outer-loop iterations by orders of magnitude, while eliminating dependence on LLMs. We demonstrate the framework on four geometric optimization problems: sphere packing in hypercubes, circle packing maximizing sum of radii, the Heilbronn triangle problem, and star discrepancy minimization. In several cases, FlowBoost discovers configurations that match or exceed the best known results. For circle packings, we improve the best known lower bounds, surpassing the LLM-based system AlphaEvolve while using substantially fewer computational resources.
Abstract:Dynamic programming (DP) algorithms for combinatorial optimization problems work with taking maximization, minimization, and classical addition in their recursion algorithms. The associated value functions correspond to convex polyhedra in the max plus semiring. Existing Neural Algorithmic Reasoning models, however, rely on softmax-normalized dot-product attention where the smooth exponential weighting blurs these sharp polyhedral structures and collapses when evaluated on out-of-distribution (OOD) settings. We introduce Tropical attention, a novel attention function that operates natively in the max-plus semiring of tropical geometry. We prove that Tropical attention can approximate tropical circuits of DP-type combinatorial algorithms. We then propose that using Tropical transformers enhances empirical OOD performance in both length generalization and value generalization, on algorithmic reasoning tasks, surpassing softmax baselines while remaining stable under adversarial attacks. We also present adversarial-attack generalization as a third axis for Neural Algorithmic Reasoning benchmarking. Our results demonstrate that Tropical attention restores the sharp, scale-invariant reasoning absent from softmax.
Abstract:How can Transformers model and learn enumerative geometry? What is a robust procedure for using Transformers in abductive knowledge discovery within a mathematician-machine collaboration? In this work, we introduce a new paradigm in computational enumerative geometry in analyzing the $\psi$-class intersection numbers on the moduli space of curves. By formulating the enumerative problem as a continuous optimization task, we develop a Transformer-based model for computing $\psi$-class intersection numbers based on the underlying quantum Airy structure. For a finite range of genera, our model is capable of regressing intersection numbers that span an extremely wide range of values, from $10^{-45}$ to $10^{45}$. To provide a proper inductive bias for capturing the recursive behavior of intersection numbers, we propose a new activation function, Dynamic Range Activator (DRA). Moreover, given the severe heteroscedasticity of $\psi$-class intersections and the required precision, we quantify the uncertainty of the predictions using Conformal Prediction with a dynamic sliding window that is aware of the number of marked points. Next, we go beyond merely computing intersection numbers and explore the enumerative "world-model" of the Transformers. Through a series of causal inference and correlational interpretability analyses, we demonstrate that Transformers are actually modeling Virasoro constraints in a purely data-driven manner. Additionally, we provide evidence for the comprehension of several values appearing in the large genus asymptotic of $\psi$-class intersection numbers through abductive hypothesis testing.