Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Samy Wu Fung

Colorado School of Mines, Department of Applied Mathematics and Statistics

Fixed-Point Neural Optimal Transport without Implicit Differentiation

May 11, 2026

Yesom Park, Eric Gelphman, Stanley Osher, Samy Wu Fung

Abstract:We propose an implicit neural formulation of optimal transport that eliminates adversarial min--max optimization and multi-network architectures commonly used in existing approaches. Our key idea is to parameterize a single potential in the Kantorovich dual and reformulate the associated c-transform as a proximal fixed-point problem. This yields a stable single-network framework in which dual feasibility is enforced exactly through proximal optimality conditions rather than adversarial training. Despite the inner fixed-point computation, gradients can be computed without differentiating through the fixed-point iterations, enabling efficient training without requiring implicit differentiation. We further establish convergence of stochastic gradient descent. The resulting framework is efficient, scalable, and broadly applicable: it simultaneously recovers forward and backward transport maps and naturally extends to class-conditional settings. Experiments on high-dimensional Gaussian benchmarks, physical datasets, and image translation tasks demonstrate strong transport accuracy together with improved training stability and favorable computational and memory efficiency.

* 37 pages, submitted to SIAM Journal on Mathematical Data Science (currently under review)

Via

Access Paper or Ask Questions

Probabilistic Gaussian Homotopy: A Probability-Space Continuation Framework for Nonconvex Optimization

Mar 13, 2026

Eshed Gal, Samy Wu Fung, Eldad Haber

Abstract:We introduce Probabilistic Gaussian Homotopy (PGH), a probability-space continuation framework for nonconvex optimization. Unlike classical Gaussian homotopy, which smooths the objective and uniformly averages gradients, PGH deforms the associated Boltzmann distribution and induces Boltzmann-weighted aggregation of perturbed gradients, which exponentially biases descent directions toward low-energy regions. We show that PGH corresponds to a log-sum-exp (soft-min) homotopy that smooths a nonconvex objective at scale $λ>0$ and recovers the original objective as $λ\to 0$, yielding a posterior-mean generalization of the Moreau envelope, and we derive a dynamical system governing minimizer evolution along an annealed homotopy path. This establishes a principled connection between Gaussian continuation, Bayesian denoising, and diffusion-style smoothing. We further propose Probabilistic Gaussian Homotopy Optimization (PGHO), a practical stochastic algorithm based on Monte Carlo gradient estimation, and demonstrate strong performance on high-dimensional nonconvex benchmarks and sparse recovery problems where classical gradient methods and objective-space smoothing frequently fail.

Via

Access Paper or Ask Questions

On the Convergence of Jacobian-Free Backpropagation for Optimal Control Problems with Implicit Hamiltonians

Jan 31, 2026

Eric Gelphman, Deepanshu Verma, Nicole Tianjiao Yang, Stanley Osher, Samy Wu Fung

Abstract:Optimal feedback control with implicit Hamiltonians poses a fundamental challenge for learning-based value function methods due to the absence of closed-form optimal control laws. Recent work~\cite{gelphman2025end} introduced an implicit deep learning approach using Jacobian-Free Backpropagation (JFB) to address this setting, but only established sample-wise descent guarantees. In this paper, we establish convergence guarantees for JFB in the stochastic minibatch setting, showing that the resulting updates converge to stationary points of the expected optimal control objective. We further demonstrate scalability on substantially higher-dimensional problems, including multi-agent optimal consumption and swarm-based quadrotor and bicycle control. Together, our results provide both theoretical justification and empirical evidence for using JFB in high-dimensional optimal control with implicit Hamiltonians.

* 19 Pages, 6 figures, 1 table. Submitted to ICML and is pending review

Via

Access Paper or Ask Questions

A Generalization Bound for a Family of Implicit Networks

Oct 09, 2024

Samy Wu Fung, Benjamin Berkels

Figure 1 for A Generalization Bound for a Family of Implicit Networks

Figure 2 for A Generalization Bound for a Family of Implicit Networks

Abstract:Implicit networks are a class of neural networks whose outputs are defined by the fixed point of a parameterized operator. They have enjoyed success in many applications including natural language processing, image processing, and numerous other applications. While they have found abundant empirical success, theoretical work on its generalization is still under-explored. In this work, we consider a large family of implicit networks defined parameterized contractive fixed point operators. We show a generalization bound for this class based on a covering number argument for the Rademacher complexity of these architectures.

Via

Access Paper or Ask Questions

On Logical Extrapolation for Mazes with Recurrent and Implicit Networks

Oct 03, 2024

Brandon Knutson, Amandin Chyba Rabeendran, Michael Ivanitskiy, Jordan Pettyjohn, Cecilia Diniz-Behn, Samy Wu Fung, Daniel McKenzie

Figure 1 for On Logical Extrapolation for Mazes with Recurrent and Implicit Networks

Figure 2 for On Logical Extrapolation for Mazes with Recurrent and Implicit Networks

Figure 3 for On Logical Extrapolation for Mazes with Recurrent and Implicit Networks

Figure 4 for On Logical Extrapolation for Mazes with Recurrent and Implicit Networks

Abstract:Recent work has suggested that certain neural network architectures-particularly recurrent neural networks (RNNs) and implicit neural networks (INNs) are capable of logical extrapolation. That is, one may train such a network on easy instances of a specific task and then apply it successfully to more difficult instances of the same task. In this paper, we revisit this idea and show that (i) The capacity for extrapolation is less robust than previously suggested. Specifically, in the context of a maze-solving task, we show that while INNs (and some RNNs) are capable of generalizing to larger maze instances, they fail to generalize along axes of difficulty other than maze size. (ii) Models that are explicitly trained to converge to a fixed point (e.g. the INN we test) are likely to do so when extrapolating, while models that are not (e.g. the RNN we test) may exhibit more exotic limiting behaviour such as limit cycles, even when they correctly solve the problem. Our results suggest that (i) further study into why such networks extrapolate easily along certain axes of difficulty yet struggle with others is necessary, and (ii) analyzing the dynamics of extrapolation may yield insights into designing more efficient and interpretable logical extrapolators.

Via

Access Paper or Ask Questions

Structured World Representations in Maze-Solving Transformers

Dec 05, 2023

Michael Igorevich Ivanitskiy, Alex F. Spies, Tilman Räuker, Guillaume Corlouer, Chris Mathwin, Lucia Quirke, Can Rager, Rusheb Shah, Dan Valentine, Cecilia Diniz Behn(+2 more)

Figure 1 for Structured World Representations in Maze-Solving Transformers

Figure 2 for Structured World Representations in Maze-Solving Transformers

Figure 3 for Structured World Representations in Maze-Solving Transformers

Figure 4 for Structured World Representations in Maze-Solving Transformers

Abstract:Transformer models underpin many recent advances in practical machine learning applications, yet understanding their internal behavior continues to elude researchers. Given the size and complexity of these models, forming a comprehensive picture of their inner workings remains a significant challenge. To this end, we set out to understand small transformer models in a more tractable setting: that of solving mazes. In this work, we focus on the abstractions formed by these models and find evidence for the consistent emergence of structured internal representations of maze topology and valid paths. We demonstrate this by showing that the residual stream of only a single token can be linearly decoded to faithfully reconstruct the entire maze. We also find that the learned embeddings of individual tokens have spatial structure. Furthermore, we take steps towards deciphering the circuity of path-following by identifying attention heads (dubbed $\textit{adjacency heads}$), which are implicated in finding valid subsequent tokens.

* 15 pages, 18 figures, 15 tables. Corresponding author: Michael Ivanitskiy (mivanits@mines.edu). Code available at https://github.com/understanding-search/structured-representations-maze-transformers

Via

Access Paper or Ask Questions

A Configurable Library for Generating and Manipulating Maze Datasets

Sep 19, 2023

Michael Igorevich Ivanitskiy, Rusheb Shah, Alex F. Spies, Tilman Räuker, Dan Valentine, Can Rager, Lucia Quirke, Chris Mathwin, Guillaume Corlouer, Cecilia Diniz Behn(+1 more)

Figure 1 for A Configurable Library for Generating and Manipulating Maze Datasets

Figure 2 for A Configurable Library for Generating and Manipulating Maze Datasets

Figure 3 for A Configurable Library for Generating and Manipulating Maze Datasets

Figure 4 for A Configurable Library for Generating and Manipulating Maze Datasets

Abstract:Understanding how machine learning models respond to distributional shifts is a key research challenge. Mazes serve as an excellent testbed due to varied generation algorithms offering a nuanced platform to simulate both subtle and pronounced distributional shifts. To enable systematic investigations of model behavior on out-of-distribution data, we present $\texttt{maze-dataset}$, a comprehensive library for generating, processing, and visualizing datasets consisting of maze-solving tasks. With this library, researchers can easily create datasets, having extensive control over the generation algorithm used, the parameters fed to the algorithm of choice, and the filters that generated mazes must satisfy. Furthermore, it supports multiple output formats, including rasterized and text-based, catering to convolutional neural networks and autoregressive transformer models. These formats, along with tools for visualizing and converting between them, ensure versatility and adaptability in research applications.

* 9 pages, 5 figures, 1 table. Corresponding author: Michael Ivanitskiy (mivanits@umich.edu). Code available at https://github.com/understanding-search/maze-dataset

Via

Access Paper or Ask Questions

Faster Predict-and-Optimize with Three-Operator Splitting

Jan 31, 2023

Daniel McKenzie, Samy Wu Fung, Howard Heaton

Figure 1 for Faster Predict-and-Optimize with Three-Operator Splitting

Figure 2 for Faster Predict-and-Optimize with Three-Operator Splitting

Figure 3 for Faster Predict-and-Optimize with Three-Operator Splitting

Figure 4 for Faster Predict-and-Optimize with Three-Operator Splitting

Abstract:In many practical settings, a combinatorial problem must be repeatedly solved with similar, but distinct parameters w. Yet, w is not directly observed; only contextual data d that correlates with w is available. It is tempting to use a neural network to predict w given d, but training such a model requires reconciling the discrete nature of combinatorial optimization with the gradient-based frameworks used to train neural networks. One approach to overcoming this issue is to consider a continuous relaxation of the combinatorial problem. While existing such approaches have shown to be highly effective on small problems (10-100 variables) they do not scale well to large problems. In this work, we show how recent results in operator splitting can be used to design such a system which is easy to train and scales effortlessly to problems with thousands of variables.

Via

Access Paper or Ask Questions

Taming Hyperparameter Tuning in Continuous Normalizing Flows Using the JKO Scheme

Nov 30, 2022

Alexander Vidal, Samy Wu Fung, Luis Tenorio, Stanley Osher, Levon Nurbekyan

Abstract:A normalizing flow (NF) is a mapping that transforms a chosen probability distribution to a normal distribution. Such flows are a common technique used for data generation and density estimation in machine learning and data science. The density estimate obtained with a NF requires a change of variables formula that involves the computation of the Jacobian determinant of the NF transformation. In order to tractably compute this determinant, continuous normalizing flows (CNF) estimate the mapping and its Jacobian determinant using a neural ODE. Optimal transport (OT) theory has been successfully used to assist in finding CNFs by formulating them as OT problems with a soft penalty for enforcing the standard normal distribution as a target measure. A drawback of OT-based CNFs is the addition of a hyperparameter, $\alpha$, that controls the strength of the soft penalty and requires significant tuning. We present JKO-Flow, an algorithm to solve OT-based CNF without the need of tuning $\alpha$. This is achieved by integrating the OT CNF framework into a Wasserstein gradient flow framework, also known as the JKO scheme. Instead of tuning $\alpha$, we repeatedly solve the optimization problem for a fixed $\alpha$ effectively performing a JKO update with a time-step $\alpha$. Hence we obtain a "divide and conquer" algorithm by repeatedly solving simpler problems instead of solving a potentially harder problem with large $\alpha$.

Via

Access Paper or Ask Questions

Explainable AI via Learning to Optimize

Apr 29, 2022

Howard Heaton, Samy Wu Fung

Figure 1 for Explainable AI via Learning to Optimize

Figure 2 for Explainable AI via Learning to Optimize

Figure 3 for Explainable AI via Learning to Optimize

Figure 4 for Explainable AI via Learning to Optimize

Abstract:Indecipherable black boxes are common in machine learning (ML), but applications increasingly require explainable artificial intelligence (XAI). The core of XAI is to establish transparent and interpretable data-driven algorithms. This work provides concrete tools for XAI in situations where prior knowledge must be encoded and untrustworthy inferences flagged. We use the "learn to optimize" (L2O) methodology wherein each inference solves a data-driven optimization problem. Our L2O models are straightforward to implement, directly encode prior knowledge, and yield theoretical guarantees (e.g. satisfaction of constraints). We also propose use of interpretable certificates to verify whether model inferences are trustworthy. Numerical examples are provided in the applications of dictionary-based signal recovery, CT imaging, and arbitrage trading of cryptoassets.

Via

Access Paper or Ask Questions