Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

RuiKang OuYang

Learning Energy-Based Models from Stochastic Interpolants using Spatiotemporal Differences

May 26, 2026

Hanlin Yu, RuiKang OuYang, Partha Kaushik, Arto Klami, Michael U. Gutmann, Omar Chehab

Abstract:Learning an energy-based model from data samples is a central problem in machine learning. Many recent and popular methods, such as denoising score matching for training energy-based diffusion models, use stochastic interpolants to corrupt data samples at different noise levels indexed by a time variable. This defines a joint density over both the data space and time, and most methods learn its energy through either spatial or temporal differences. We identify distinct failure modes for both of these approaches. To solve them, we propose Spatiotemporal Noise-Contrastive Estimation (stNCE), a framework for learning the energy through joint spatiotemporal differences. stNCE unifies many existing methods and leads to new training objectives. Experiments on images and molecules demonstrate performance competitive with state-of-the-art density estimation methods.

Via

Access Paper or Ask Questions

Stochastic Interpolants in Hilbert Spaces

Feb 02, 2026

James Boran Yu, RuiKang OuYang, Julien Horwood, José Miguel Hernández-Lobato

Abstract:Although diffusion models have successfully extended to function-valued data, stochastic interpolants -- which offer a flexible way to bridge arbitrary distributions -- remain limited to finite-dimensional settings. This work bridges this gap by establishing a rigorous framework for stochastic interpolants in infinite-dimensional Hilbert spaces. We provide comprehensive theoretical foundations, including proofs of well-posedness and explicit error bounds. We demonstrate the effectiveness of the proposed framework for conditional generation, focusing particularly on complex PDE-based benchmarks. By enabling generative bridges between arbitrary functional distributions, our approach achieves state-of-the-art results, offering a powerful, general-purpose tool for scientific discovery.

* 8 pages, 1 figure, 2 tables

Via

Access Paper or Ask Questions

A Diffusive Classification Loss for Learning Energy-based Generative Models

Jan 28, 2026

Louis Grenioux, RuiKang OuYang, José Miguel Hernández-Lobato

Abstract:Score-based generative models have recently achieved remarkable success. While they are usually parameterized by the score, an alternative way is to use a series of time-dependent energy-based models (EBMs), where the score is obtained from the negative input-gradient of the energy. Crucially, EBMs can be leveraged not only for generation, but also for tasks such as compositional sampling or building Boltzmann Generators via Monte Carlo methods. However, training EBMs remains challenging. Direct maximum likelihood is computationally prohibitive due to the need for nested sampling, while score matching, though efficient, suffers from mode blindness. To address these issues, we introduce the Diffusive Classification (DiffCLF) objective, a simple method that avoids blindness while remaining computationally efficient. DiffCLF reframes EBM learning as a supervised classification problem across noise levels, and can be seamlessly combined with standard score-based objectives. We validate the effectiveness of DiffCLF by comparing the estimated energies against ground truth in analytical Gaussian mixture cases, and by applying the trained models to tasks such as model composition and Boltzmann Generator sampling. Our results show that DiffCLF enables EBMs with higher fidelity and broader applicability than existing approaches.

Via

Access Paper or Ask Questions

Progressive Tempering Sampler with Diffusion

Jun 05, 2025

Severi Rissanen, RuiKang OuYang, Jiajun He, Wenlin Chen, Markus Heinonen, Arno Solin, José Miguel Hernández-Lobato

Abstract:Recent research has focused on designing neural samplers that amortize the process of sampling from unnormalized densities. However, despite significant advancements, they still fall short of the state-of-the-art MCMC approach, Parallel Tempering (PT), when it comes to the efficiency of target evaluations. On the other hand, unlike a well-trained neural sampler, PT yields only dependent samples and needs to be rerun -- at considerable computational cost -- whenever new samples are required. To address these weaknesses, we propose the Progressive Tempering Sampler with Diffusion (PTSD), which trains diffusion models sequentially across temperatures, leveraging the advantages of PT to improve the training of neural samplers. We also introduce a novel method to combine high-temperature diffusion models to generate approximate lower-temperature samples, which are minimally refined using MCMC and used to train the next diffusion model. PTSD enables efficient reuse of sample information across temperature levels while generating well-mixed, uncorrelated samples. Our method significantly improves target evaluation efficiency, outperforming diffusion-based neural samplers.

* Accepted for publication at ICML 2025

Via

Access Paper or Ask Questions

No Trick, No Treat: Pursuits and Challenges Towards Simulation-free Training of Neural Samplers

Feb 10, 2025

Jiajun He, Yuanqi Du, Francisco Vargas, Dinghuai Zhang, Shreyas Padhy, RuiKang OuYang, Carla Gomes, José Miguel Hernández-Lobato

Figure 1 for No Trick, No Treat: Pursuits and Challenges Towards Simulation-free Training of Neural Samplers

Figure 2 for No Trick, No Treat: Pursuits and Challenges Towards Simulation-free Training of Neural Samplers

Figure 3 for No Trick, No Treat: Pursuits and Challenges Towards Simulation-free Training of Neural Samplers

Figure 4 for No Trick, No Treat: Pursuits and Challenges Towards Simulation-free Training of Neural Samplers

Abstract:We consider the sampling problem, where the aim is to draw samples from a distribution whose density is known only up to a normalization constant. Recent breakthroughs in generative modeling to approximate a high-dimensional data distribution have sparked significant interest in developing neural network-based methods for this challenging problem. However, neural samplers typically incur heavy computational overhead due to simulating trajectories during training. This motivates the pursuit of simulation-free training procedures of neural samplers. In this work, we propose an elegant modification to previous methods, which allows simulation-free training with the help of a time-dependent normalizing flow. However, it ultimately suffers from severe mode collapse. On closer inspection, we find that nearly all successful neural samplers rely on Langevin preconditioning to avoid mode collapsing. We systematically analyze several popular methods with various objective functions and demonstrate that, in the absence of Langevin preconditioning, most of them fail to adequately cover even a simple target. Finally, we draw attention to a strong baseline by combining the state-of-the-art MCMC method, Parallel Tempering (PT), with an additional generative model to shed light on future explorations of neural samplers.

* 21 pages, 5 figures, 6 tables

Via

Access Paper or Ask Questions

BEnDEM:A Boltzmann Sampler Based on Bootstrapped Denoising Energy Matching

Sep 15, 2024

RuiKang OuYang, Bo Qiang, José Miguel Hernández-Lobato

Figure 1 for BEnDEM:A Boltzmann Sampler Based on Bootstrapped Denoising Energy Matching

Figure 2 for BEnDEM:A Boltzmann Sampler Based on Bootstrapped Denoising Energy Matching

Figure 3 for BEnDEM:A Boltzmann Sampler Based on Bootstrapped Denoising Energy Matching

Figure 4 for BEnDEM:A Boltzmann Sampler Based on Bootstrapped Denoising Energy Matching

Abstract:Developing an efficient sampler capable of generating independent and identically distributed (IID) samples from a Boltzmann distribution is a crucial challenge in scientific research, e.g. molecular dynamics. In this work, we intend to learn neural samplers given energy functions instead of data sampled from the Boltzmann distribution. By learning the energies of the noised data, we propose a diffusion-based sampler, ENERGY-BASED DENOISING ENERGY MATCHING, which theoretically has lower variance and more complexity compared to related works. Furthermore, a novel bootstrapping technique is applied to EnDEM to balance between bias and variance. We evaluate EnDEM and BEnDEM on a 2-dimensional 40 Gaussian Mixture Model (GMM) and a 4-particle double-welling potential (DW-4). The experimental results demonstrate that BEnDEM can achieve state-of-the-art performance while being more robust.

* 20 pages, 7 figures, 2 tables

Via

Access Paper or Ask Questions