Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Justin Solomon

Do Neural Optimal Transport Solvers Work? A Continuous Wasserstein-2 Benchmark

Jun 03, 2021

Alexander Korotin, Lingxiao Li, Aude Genevay, Justin Solomon, Alexander Filippov, Evgeny Burnaev

Figure 1 for Do Neural Optimal Transport Solvers Work? A Continuous Wasserstein-2 Benchmark

Figure 2 for Do Neural Optimal Transport Solvers Work? A Continuous Wasserstein-2 Benchmark

Figure 3 for Do Neural Optimal Transport Solvers Work? A Continuous Wasserstein-2 Benchmark

Figure 4 for Do Neural Optimal Transport Solvers Work? A Continuous Wasserstein-2 Benchmark

Abstract:Despite the recent popularity of neural network-based solvers for optimal transport (OT), there is no standard quantitative way to evaluate their performance. In this paper, we address this issue for quadratic-cost transport -- specifically, computation of the Wasserstein-2 distance, a commonly-used formulation of optimal transport in machine learning. To overcome the challenge of computing ground truth transport maps between continuous measures needed to assess these solvers, we use input-convex neural networks (ICNN) to construct pairs of measures whose ground truth OT maps can be obtained analytically. This strategy yields pairs of continuous benchmark measures in high-dimensional spaces such as spaces of images. We thoroughly evaluate existing optimal transport solvers using these benchmark measures. Even though these solvers perform well in downstream tasks, many do not faithfully recover optimal transport maps. To investigate the cause of this discrepancy, we further test the solvers in a setting of image generation. Our study reveals crucial limitations of existing solvers and shows that increased OT accuracy does not necessarily correlate to better results downstream.

Via

Access Paper or Ask Questions

Large-Scale Wasserstein Gradient Flows

Jun 01, 2021

Petr Mokrov, Alexander Korotin, Lingxiao Li, Aude Genevay, Justin Solomon, Evgeny Burnaev

Figure 1 for Large-Scale Wasserstein Gradient Flows

Figure 2 for Large-Scale Wasserstein Gradient Flows

Figure 3 for Large-Scale Wasserstein Gradient Flows

Figure 4 for Large-Scale Wasserstein Gradient Flows

Abstract:Wasserstein gradient flows provide a powerful means of understanding and solving many diffusion equations. Specifically, Fokker-Planck equations, which model the diffusion of probability measures, can be understood as gradient descent over entropy functionals in Wasserstein space. This equivalence, introduced by Jordan, Kinderlehrer and Otto, inspired the so-called JKO scheme to approximate these diffusion processes via an implicit discretization of the gradient flow in Wasserstein space. Solving the optimization problem associated to each JKO step, however, presents serious computational challenges. We introduce a scalable method to approximate Wasserstein gradient flows, targeted to machine learning applications. Our approach relies on input-convex neural networks (ICNNs) to discretize the JKO steps, which can be optimized by stochastic gradient descent. Unlike previous work, our method does not require domain discretization or particle simulation. As a result, we can sample from the measure at each time step of the diffusion and compute its probability density. We demonstrate our algorithm's performance by computing diffusions following the Fokker-Planck equation and apply it to unnormalized density sampling as well as nonlinear filtering.

Via

Access Paper or Ask Questions

MarioNette: Self-Supervised Sprite Learning

Apr 29, 2021

Dmitriy Smirnov, Michael Gharbi, Matthew Fisher, Vitor Guizilini, Alexei A. Efros, Justin Solomon

Figure 1 for MarioNette: Self-Supervised Sprite Learning

Figure 2 for MarioNette: Self-Supervised Sprite Learning

Figure 3 for MarioNette: Self-Supervised Sprite Learning

Figure 4 for MarioNette: Self-Supervised Sprite Learning

Abstract:Visual content often contains recurring elements. Text is made up of glyphs from the same font, animations, such as cartoons or video games, are composed of sprites moving around the screen, and natural videos frequently have repeated views of objects. In this paper, we propose a deep learning approach for obtaining a graphically disentangled representation of recurring elements in a completely self-supervised manner. By jointly learning a dictionary of texture patches and training a network that places them onto a canvas, we effectively deconstruct sprite-based content into a sparse, consistent, and interpretable representation that can be easily used in downstream tasks. Our framework offers a promising approach for discovering recurring patterns in image collections without supervision.

Via

Access Paper or Ask Questions

Improving Approximate Optimal Transport Distances using Quantization

Feb 25, 2021

Gaspard Beugnot, Aude Genevay, Kristjan Greenewald, Justin Solomon

Figure 1 for Improving Approximate Optimal Transport Distances using Quantization

Figure 2 for Improving Approximate Optimal Transport Distances using Quantization

Figure 3 for Improving Approximate Optimal Transport Distances using Quantization

Figure 4 for Improving Approximate Optimal Transport Distances using Quantization

Abstract:Optimal transport (OT) is a popular tool in machine learning to compare probability measures geometrically, but it comes with substantial computational burden. Linear programming algorithms for computing OT distances scale cubically in the size of the input, making OT impractical in the large-sample regime. We introduce a practical algorithm, which relies on a quantization step, to estimate OT distances between measures given cheap sample access. We also provide a variant of our algorithm to improve the performance of approximate solvers, focusing on those for entropy-regularized transport. We give theoretical guarantees on the benefits of this quantization step and display experiments showing that it behaves well in practice, providing a practical approximation algorithm that can be used as a drop-in replacement for existing OT estimators.

Via

Access Paper or Ask Questions

Continuous Wasserstein-2 Barycenter Estimation without Minimax Optimization

Feb 02, 2021

Alexander Korotin, Lingxiao Li, Justin Solomon, Evgeny Burnaev

Figure 1 for Continuous Wasserstein-2 Barycenter Estimation without Minimax Optimization

Figure 2 for Continuous Wasserstein-2 Barycenter Estimation without Minimax Optimization

Figure 3 for Continuous Wasserstein-2 Barycenter Estimation without Minimax Optimization

Figure 4 for Continuous Wasserstein-2 Barycenter Estimation without Minimax Optimization

Abstract:Wasserstein barycenters provide a geometric notion of the weighted average of probability measures based on optimal transport. In this paper, we present a scalable algorithm to compute Wasserstein-2 barycenters given sample access to the input measures, which are not restricted to being discrete. While past approaches rely on entropic or quadratic regularization, we employ input convex neural networks and cycle-consistency regularization to avoid introducing bias. As a result, our approach does not resort to minimax optimization. We provide theoretical analysis on error bounds as well as empirical evidence of the effectiveness of the proposed approach in low-dimensional qualitative scenarios and high-dimensional quantitative experiments.

Via

Access Paper or Ask Questions

$k$-Variance: A Clustered Notion of Variance

Dec 13, 2020

Justin Solomon, Kristjan Greenewald, Haikady N. Nagaraja

Figure 1 for $k$-Variance: A Clustered Notion of Variance

Figure 2 for $k$-Variance: A Clustered Notion of Variance

Figure 3 for $k$-Variance: A Clustered Notion of Variance

Figure 4 for $k$-Variance: A Clustered Notion of Variance

Abstract:We introduce $k$-variance, a generalization of variance built on the machinery of random bipartite matchings. $K$-variance measures the expected cost of matching two sets of $k$ samples from a distribution to each other, capturing local rather than global information about a measure as $k$ increases; it is easily approximated stochastically using sampling and linear programming. In addition to defining $k$-variance and proving its basic properties, we provide in-depth analysis of this quantity in several key cases, including one-dimensional measures, clustered measures, and measures concentrated on low-dimensional subsets of $\mathbb R^n$. We conclude with experiments and open problems motivated by this new way to summarize distributional shape.

Via

Access Paper or Ask Questions

Multi-Frame to Single-Frame: Knowledge Distillation for 3D Object Detection

Sep 24, 2020

Yue Wang, Alireza Fathi, Jiajun Wu, Thomas Funkhouser, Justin Solomon

Figure 1 for Multi-Frame to Single-Frame: Knowledge Distillation for 3D Object Detection

Figure 2 for Multi-Frame to Single-Frame: Knowledge Distillation for 3D Object Detection

Figure 3 for Multi-Frame to Single-Frame: Knowledge Distillation for 3D Object Detection

Figure 4 for Multi-Frame to Single-Frame: Knowledge Distillation for 3D Object Detection

Abstract:A common dilemma in 3D object detection for autonomous driving is that high-quality, dense point clouds are only available during training, but not testing. We use knowledge distillation to bridge the gap between a model trained on high-quality inputs at training time and another tested on low-quality inputs at inference time. In particular, we design a two-stage training pipeline for point cloud object detection. First, we train an object detection model on dense point clouds, which are generated from multiple frames using extra information only available at training time. Then, we train the model's identical counterpart on sparse single-frame point clouds with consistency regularization on features from both models. We show that this procedure improves performance on low-quality data during testing, without additional overhead.

* The Workshop on Perception for Autonomous Driving at ECCV2020

Via

Access Paper or Ask Questions

Continuous Regularized Wasserstein Barycenters

Aug 28, 2020

Lingxiao Li, Aude Genevay, Mikhail Yurochkin, Justin Solomon

Figure 1 for Continuous Regularized Wasserstein Barycenters

Figure 2 for Continuous Regularized Wasserstein Barycenters

Figure 3 for Continuous Regularized Wasserstein Barycenters

Figure 4 for Continuous Regularized Wasserstein Barycenters

Abstract:Wasserstein barycenters provide a geometrically meaningful way to aggregate probability distributions, built on the theory of optimal transport. They are difficult to compute in practice, however, leading previous work to restrict their supports to finite sets of points. Leveraging a new dual formulation for the regularized Wasserstein barycenter problem, we introduce a stochastic algorithm that constructs a continuous approximation of the barycenter. We establish strong duality and use the corresponding primal-dual relationship to parametrize the barycenter implicitly using the dual potentials of regularized transport problems. The resulting problem can be solved with stochastic gradient descent, which yields an efficient online algorithm to approximate the barycenter of continuous distributions given sample access. We demonstrate the effectiveness of our approach and compare against previous work on synthetic examples and real-world applications.

Via

Access Paper or Ask Questions

Pillar-based Object Detection for Autonomous Driving

Jul 26, 2020

Yue Wang, Alireza Fathi, Abhijit Kundu, David Ross, Caroline Pantofaru, Thomas Funkhouser, Justin Solomon

Figure 1 for Pillar-based Object Detection for Autonomous Driving

Figure 2 for Pillar-based Object Detection for Autonomous Driving

Figure 3 for Pillar-based Object Detection for Autonomous Driving

Figure 4 for Pillar-based Object Detection for Autonomous Driving

Abstract:We present a simple and flexible object detection framework optimized for autonomous driving. Building on the observation that point clouds in this application are extremely sparse, we propose a practical pillar-based approach to fix the imbalance issue caused by anchors. In particular, our algorithm incorporates a cylindrical projection into multi-view feature learning, predicts bounding box parameters per pillar rather than per point or per anchor, and includes an aligned pillar-to-point projection module to improve the final prediction. Our anchor-free approach avoids hyperparameter search associated with past methods, simplifying 3D object detection while significantly improving upon state-of-the-art.

* Accepted to ECCV2020

Via

Access Paper or Ask Questions

Model Fusion with Kullback--Leibler Divergence

Jul 13, 2020

Sebastian Claici, Mikhail Yurochkin, Soumya Ghosh, Justin Solomon

Figure 1 for Model Fusion with Kullback--Leibler Divergence

Figure 2 for Model Fusion with Kullback--Leibler Divergence

Figure 3 for Model Fusion with Kullback--Leibler Divergence

Figure 4 for Model Fusion with Kullback--Leibler Divergence

Abstract:We propose a method to fuse posterior distributions learned from heterogeneous datasets. Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors and proceeds using a simple assign-and-average approach. The components of the dataset posteriors are assigned to the proposed global model components by solving a regularized variant of the assignment problem. The global components are then updated based on these assignments by their mean under a KL divergence. For exponential family variational distributions, our formulation leads to an efficient non-parametric algorithm for computing the fused model. Our algorithm is easy to describe and implement, efficient, and competitive with state-of-the-art on motion capture analysis, topic modeling, and federated learning of Bayesian neural networks.

* ICML 2020

Via

Access Paper or Ask Questions