Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jesse Thaler

Descending into the Modular Bootstrap

Apr 01, 2026

Nathan Benjamin, A. Liam Fitzpatrick, Wei Li, Jesse Thaler

Abstract:In this paper, we attempt to explore the landscape of two-dimensional conformal field theories (2d CFTs) by efficiently searching for numerical solutions to the modular bootstrap equation using machine-learning-style optimization. The torus partition function of a 2d CFT is fixed by the spectrum of its primary operators and its chiral algebra, which we take to be the Virasoro algebra with $c>1$. We translate the requirement that this partition function is modular invariant into a loss function, which we then minimize to identify possible primary spectra. Our approach involves two technical innovations that facilitate finding reliable candidate CFTs. The first is a strategy to estimate the uncertainty associated with truncating the spectrum to the lowest dimension operators. The second is the use of a new singular-value-based optimizer (Sven) that is more effective than gradient descent at navigating the hierarchical structure of the loss landscape. We numerically construct candidate truncated CFT partition functions with central charges between 1 and $\frac{8}{7}$, a range devoid of known examples, and argue that these candidates likely come from a continuous space of modular bootstrap solutions. We also provide evidence for a more stringent constraint on the spectral gap near $c = 1$ than the existing bound of $Δ_{\rm gap} \le \frac{c}{6} + \frac{1}{3}$.

* 57 pages, 23 figures, 4 tables; code available at http://github.com/jdthaler/modular-bootstrap

Via

Access Paper or Ask Questions

Sven: Singular Value Descent as a Computationally Efficient Natural Gradient Method

Apr 01, 2026

Samuel Bright-Thonney, Thomas R. Harvey, Andre Lukas, Jesse Thaler

Abstract:We introduce Sven (Singular Value dEsceNt), a new optimization algorithm for neural networks that exploits the natural decomposition of loss functions into a sum over individual data points, rather than reducing the full loss to a single scalar before computing a parameter update. Sven treats each data point's residual as a separate condition to be satisfied simultaneously, using the Moore-Penrose pseudoinverse of the loss Jacobian to find the minimum-norm parameter update that best satisfies all conditions at once. In practice, this pseudoinverse is approximated via a truncated singular value decomposition, retaining only the $k$ most significant directions and incurring a computational overhead of only a factor of $k$ relative to stochastic gradient descent. This is in comparison to traditional natural gradient methods, which scale as the square of the number of parameters. We show that Sven can be understood as a natural gradient method generalized to the over-parametrized regime, recovering natural gradient descent in the under-parametrized limit. On regression tasks, Sven significantly outperforms standard first-order methods including Adam, converging faster and to a lower final loss, while remaining competitive with LBFGS at a fraction of the wall-time cost. We discuss the primary challenge to scaling, namely memory overhead, and propose mitigation strategies. Beyond standard machine learning benchmarks, we anticipate that Sven will find natural application in scientific computing settings where custom loss functions decompose into several conditions.

Via

Access Paper or Ask Questions

A Lorentz-Equivariant Transformer for All of the LHC

Nov 01, 2024

Johann Brehmer, Víctor Bresó, Pim de Haan, Tilman Plehn, Huilin Qu, Jonas Spinner, Jesse Thaler

Abstract:We show that the Lorentz-Equivariant Geometric Algebra Transformer (L-GATr) yields state-of-the-art performance for a wide range of machine learning tasks at the Large Hadron Collider. L-GATr represents data in a geometric algebra over space-time and is equivariant under Lorentz transformations. The underlying architecture is a versatile and scalable transformer, which is able to break symmetries if needed. We demonstrate the power of L-GATr for amplitude regression and jet classification, and then benchmark it as the first Lorentz-equivariant generative network. For all three LHC tasks, we find significant improvements over previous architectures.

* 26 pages, 7 figures, 8 tables

Via

Access Paper or Ask Questions

Moment Unfolding

Jul 15, 2024

Krish Desai, Benjamin Nachman, Jesse Thaler

Abstract:Deconvolving ("unfolding'') detector distortions is a critical step in the comparison of cross section measurements with theoretical predictions in particle and nuclear physics. However, most existing approaches require histogram binning while many theoretical predictions are at the level of statistical moments. We develop a new approach to directly unfold distribution moments as a function of another observable without having to first discretize the data. Our Moment Unfolding technique uses machine learning and is inspired by Generative Adversarial Networks (GANs). We demonstrate the performance of this approach using jet substructure measurements in collider physics. With this illustrative example, we find that our Moment Unfolding protocol is more precise than bin-based approaches and is as or more precise than completely unbinned methods.

* 16 pages, 6 figures, 1 table

Via

Access Paper or Ask Questions

Lorentz-Equivariant Geometric Algebra Transformers for High-Energy Physics

May 23, 2024

Jonas Spinner, Victor Bresó, Pim de Haan, Tilman Plehn, Jesse Thaler, Johann Brehmer

Abstract:Extracting scientific understanding from particle-physics experiments requires solving diverse learning problems with high precision and good data efficiency. We propose the Lorentz Geometric Algebra Transformer (L-GATr), a new multi-purpose architecture for high-energy physics. L-GATr represents high-energy data in a geometric algebra over four-dimensional space-time and is equivariant under Lorentz transformations, the symmetry group of relativistic kinematics. At the same time, the architecture is a Transformer, which makes it versatile and scalable to large systems. L-GATr is first demonstrated on regression and classification tasks from particle physics. We then construct the first Lorentz-equivariant generative model: a continuous normalizing flow based on an L-GATr network, trained with Riemannian flow matching. Across our experiments, L-GATr is on par with or outperforms strong domain-specific baselines.

* 10+12 pages, 5+2 figures, 2 tables

Via

Access Paper or Ask Questions

PAPERCLIP: Associating Astronomical Observations and Natural Language with Multi-Modal Models

Mar 13, 2024

Siddharth Mishra-Sharma, Yiding Song, Jesse Thaler

Figure 1 for PAPERCLIP: Associating Astronomical Observations and Natural Language with Multi-Modal Models

Figure 2 for PAPERCLIP: Associating Astronomical Observations and Natural Language with Multi-Modal Models

Figure 3 for PAPERCLIP: Associating Astronomical Observations and Natural Language with Multi-Modal Models

Figure 4 for PAPERCLIP: Associating Astronomical Observations and Natural Language with Multi-Modal Models

Abstract:We present PAPERCLIP (Proposal Abstracts Provide an Effective Representation for Contrastive Language-Image Pre-training), a method which associates astronomical observations imaged by telescopes with natural language using a neural network model. The model is fine-tuned from a pre-trained Contrastive Language-Image Pre-training (CLIP) model using successful observing proposal abstracts and corresponding downstream observations, with the abstracts optionally summarized via guided generation using large language models (LLMs). Using observations from the Hubble Space Telescope (HST) as an example, we show that the fine-tuned model embodies a meaningful joint representation between observations and natural language through tests targeting image retrieval (i.e., finding the most relevant observations using natural language queries) and description retrieval (i.e., querying for astrophysical object classes and use cases most relevant to a given observation). Our study demonstrates the potential for using generalist foundation models rather than task-specific models for interacting with astronomical data by leveraging text as an interface.

* 17+6 pages, 3+1 figures, 5+2 tables

Via

Access Paper or Ask Questions

Moments of Clarity: Streamlining Latent Spaces in Machine Learning using Moment Pooling

Mar 13, 2024

Rikab Gambhir, Athis Osathapan, Jesse Thaler

Abstract:Many machine learning applications involve learning a latent representation of data, which is often high-dimensional and difficult to directly interpret. In this work, we propose "Moment Pooling", a natural extension of Deep Sets networks which drastically decrease latent space dimensionality of these networks while maintaining or even improving performance. Moment Pooling generalizes the summation in Deep Sets to arbitrary multivariate moments, which enables the model to achieve a much higher effective latent dimensionality for a fixed latent dimension. We demonstrate Moment Pooling on the collider physics task of quark/gluon jet classification by extending Energy Flow Networks (EFNs) to Moment EFNs. We find that Moment EFNs with latent dimensions as small as 1 perform similarly to ordinary EFNs with higher latent dimension. This small latent dimension allows for the internal representation to be directly visualized and interpreted, which in turn enables the learned internal jet representation to be extracted in closed form.

* 15+7 pages, 14 figures, 7 tables. Code available at https://github.com/athiso/moment and https://github.com/rikab/MomentAnalysis

Via

Access Paper or Ask Questions

EPiC-GAN: Equivariant Point Cloud Generation for Particle Jets

Jan 17, 2023

Erik Buhmann, Gregor Kasieczka, Jesse Thaler

Abstract:With the vast data-collecting capabilities of current and future high-energy collider experiments, there is an increasing demand for computationally efficient simulations. Generative machine learning models enable fast event generation, yet so far these approaches are largely constrained to fixed data structures and rigid detector geometries. In this paper, we introduce EPiC-GAN - equivariant point cloud generative adversarial network - which can produce point clouds of variable multiplicity. This flexible framework is based on deep sets and is well suited for simulating sprays of particles called jets. The generator and discriminator utilize multiple EPiC layers with an interpretable global latent vector. Crucially, the EPiC layers do not rely on pairwise information sharing between particles, which leads to a significant speed-up over graph- and transformer-based approaches with more complex relation diagrams. We demonstrate that EPiC-GAN scales well to large particle multiplicities and achieves high generation fidelity on benchmark jet generation tasks.

* 18 pages, 8 figures, 2 tables

Via

Access Paper or Ask Questions

Bias and Priors in Machine Learning Calibrations for High Energy Physics

May 10, 2022

Rikab Gambhir, Benjamin Nachman, Jesse Thaler

Figure 1 for Bias and Priors in Machine Learning Calibrations for High Energy Physics

Figure 2 for Bias and Priors in Machine Learning Calibrations for High Energy Physics

Figure 3 for Bias and Priors in Machine Learning Calibrations for High Energy Physics

Figure 4 for Bias and Priors in Machine Learning Calibrations for High Energy Physics

Abstract:Machine learning offers an exciting opportunity to improve the calibration of nearly all reconstructed objects in high-energy physics detectors. However, machine learning approaches often depend on the spectra of examples used during training, an issue known as prior dependence. This is an undesirable property of a calibration, which needs to be applicable in a variety of environments. The purpose of this paper is to explicitly highlight the prior dependence of some machine learning-based calibration strategies. We demonstrate how some recent proposals for both simulation-based and data-based calibrations inherit properties of the sample used for training, which can result in biases for downstream analyses. In the case of simulation-based calibration, we argue that our recently proposed Gaussian Ansatz approach can avoid some of the pitfalls of prior dependence, whereas prior-independent data-based calibration remains an open problem.

* 17 pages, 7 figures, code available at https://github.com/hep-lbdl/calibrationpriors

Via

Access Paper or Ask Questions

Scaffolding Simulations with Deep Learning for High-dimensional Deconvolution

May 10, 2021

Anders Andreassen, Patrick T. Komiske, Eric M. Metodiev, Benjamin Nachman, Adi Suresh, Jesse Thaler

Figure 1 for Scaffolding Simulations with Deep Learning for High-dimensional Deconvolution

Figure 2 for Scaffolding Simulations with Deep Learning for High-dimensional Deconvolution

Abstract:A common setting for scientific inference is the ability to sample from a high-fidelity forward model (simulation) without having an explicit probability density of the data. We propose a simulation-based maximum likelihood deconvolution approach in this setting called OmniFold. Deep learning enables this approach to be naturally unbinned and (variable-, and) high-dimensional. In contrast to model parameter estimation, the goal of deconvolution is to remove detector distortions in order to enable a variety of down-stream inference tasks. Our approach is the deep learning generalization of the common Richardson-Lucy approach that is also called Iterative Bayesian Unfolding in particle physics. We show how OmniFold can not only remove detector distortions, but it can also account for noise processes and acceptance effects.

* ICLR simDL workshop 2021 (https://simdl.github.io/files/12.pdf)
* 6 pages, 1 figure, 1 table

Via

Access Paper or Ask Questions