Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stefano Ermon

Imitation with Neural Density Models

Oct 19, 2020

Kuno Kim, Akshat Jindal, Yang Song, Jiaming Song, Yanan Sui, Stefano Ermon

Figure 1 for Imitation with Neural Density Models

Figure 2 for Imitation with Neural Density Models

Figure 3 for Imitation with Neural Density Models

Figure 4 for Imitation with Neural Density Models

Abstract:We propose a new framework for Imitation Learning (IL) via density estimation of the expert's occupancy measure followed by Maximum Occupancy Entropy Reinforcement Learning (RL) using the density as a reward. Our approach maximizes a non-adversarial model-free RL objective that provably lower bounds reverse Kullback-Leibler divergence between occupancy measures of the expert and imitator. We present a practical IL algorithm, Neural Density Imitation (NDI), which obtains state-of-the-art demonstration efficiency on benchmark control tasks.

Via

Access Paper or Ask Questions

Denoising Diffusion Implicit Models

Oct 06, 2020

Jiaming Song, Chenlin Meng, Stefano Ermon

Figure 1 for Denoising Diffusion Implicit Models

Figure 2 for Denoising Diffusion Implicit Models

Figure 3 for Denoising Diffusion Implicit Models

Figure 4 for Denoising Diffusion Implicit Models

Abstract:Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples $10 \times$ to $50 \times$ faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.

Via

Access Paper or Ask Questions

Understanding Classifier Mistakes with Generative Models

Oct 05, 2020

Laëtitia Shao, Yang Song, Stefano Ermon

Figure 1 for Understanding Classifier Mistakes with Generative Models

Figure 2 for Understanding Classifier Mistakes with Generative Models

Figure 3 for Understanding Classifier Mistakes with Generative Models

Figure 4 for Understanding Classifier Mistakes with Generative Models

Abstract:Although deep neural networks are effective on supervised learning tasks, they have been shown to be brittle. They are prone to overfitting on their training distribution and are easily fooled by small adversarial perturbations. In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize. We propose a generative model of the features extracted by a classifier, and show using rigorous hypothesis testing that errors tend to occur when features are assigned low-probability by our model. From this observation, we develop a detection criteria for samples on which a classifier is likely to fail at test time. In particular, we test against three different sources of classification failures: mistakes made on the test set due to poor model generalization, adversarial samples and out-of-distribution samples. Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.

* 17 pages, 10 figures. Submitted to ICLR2021

Via

Access Paper or Ask Questions

Using satellite imagery to understand and promote sustainable development

Sep 23, 2020

Marshall Burke, Anne Driscoll, David B. Lobell, Stefano Ermon

Figure 1 for Using satellite imagery to understand and promote sustainable development

Figure 2 for Using satellite imagery to understand and promote sustainable development

Figure 3 for Using satellite imagery to understand and promote sustainable development

Figure 4 for Using satellite imagery to understand and promote sustainable development

Abstract:Accurate and comprehensive measurements of a range of sustainable development outcomes are fundamental inputs into both research and policy. We synthesize the growing literature that uses satellite imagery to understand these outcomes, with a focus on approaches that combine imagery with machine learning. We quantify the paucity of ground data on key human-related outcomes and the growing abundance and resolution (spatial, temporal, and spectral) of satellite imagery. We then review recent machine learning approaches to model-building in the context of scarce and noisy training data, highlighting how this noise often leads to incorrect assessment of models' predictive performance. We quantify recent model performance across multiple sustainable development domains, discuss research and policy applications, explore constraints to future progress, and highlight key research directions for the field.

Via

Access Paper or Ask Questions

Privacy Preserving Recalibration under Domain Shift

Aug 21, 2020

Rachel Luo, Shengjia Zhao, Jiaming Song, Jonathan Kuck, Stefano Ermon, Silvio Savarese

Figure 1 for Privacy Preserving Recalibration under Domain Shift

Figure 2 for Privacy Preserving Recalibration under Domain Shift

Figure 3 for Privacy Preserving Recalibration under Domain Shift

Figure 4 for Privacy Preserving Recalibration under Domain Shift

Abstract:Classifiers deployed in high-stakes real-world applications must output calibrated confidence scores, i.e. their predicted probabilities should reflect empirical frequencies. Recalibration algorithms can greatly improve a model's probability estimates; however, existing algorithms are not applicable in real-world situations where the test data follows a different distribution from the training data, and privacy preservation is paramount (e.g. protecting patient records). We introduce a framework that abstracts out the properties of recalibration problems under differential privacy constraints. This framework allows us to adapt existing recalibration algorithms to satisfy differential privacy while remaining effective for domain-shift situations. Guided by our framework, we also design a novel recalibration algorithm, accuracy temperature scaling, that outperforms prior work on private datasets. In an extensive empirical study, we find that our algorithm improves calibration on domain-shift benchmarks under the constraints of differential privacy. On the 15 highest severity perturbations of the ImageNet-C dataset, our method achieves a median ECE of 0.029, over 2x better than the next best recalibration method and almost 5x better than without recalibration.

Via

Access Paper or Ask Questions

HiPPO: Recurrent Memory with Optimal Polynomial Projections

Aug 17, 2020

Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, Christopher Re

Figure 1 for HiPPO: Recurrent Memory with Optimal Polynomial Projections

Figure 2 for HiPPO: Recurrent Memory with Optimal Polynomial Projections

Figure 3 for HiPPO: Recurrent Memory with Optimal Polynomial Projections

Figure 4 for HiPPO: Recurrent Memory with Optimal Polynomial Projections

Abstract:A central problem in learning from sequential data is representing cumulative history in an incremental fashion as more data is processed. We introduce a general framework (HiPPO) for the online compression of continuous signals and discrete time series by projection onto polynomial bases. Given a measure that specifies the importance of each time step in the past, HiPPO produces an optimal solution to a natural online function approximation problem. As special cases, our framework yields a short derivation of the recent Legendre Memory Unit (LMU) from first principles, and generalizes the ubiquitous gating mechanism of recurrent neural networks such as GRUs. This formal framework yields a new memory update mechanism (HiPPO-LegS) that scales through time to remember all history, avoiding priors on the timescale. HiPPO-LegS enjoys the theoretical benefits of timescale robustness, fast updates, and bounded gradients. By incorporating the memory dynamics into recurrent neural networks, HiPPO RNNs can empirically capture complex temporal dependencies. On the benchmark permuted MNIST dataset, HiPPO-LegS sets a new state-of-the-art accuracy of 98.3%. Finally, on a novel trajectory classification task testing robustness to out-of-distribution timescales and missing data, HiPPO-LegS outperforms RNN and neural ODE baselines by 25-40% accuracy.

Via

Access Paper or Ask Questions

Multi-label Contrastive Predictive Coding

Jul 20, 2020

Jiaming Song, Stefano Ermon

Figure 1 for Multi-label Contrastive Predictive Coding

Figure 2 for Multi-label Contrastive Predictive Coding

Figure 3 for Multi-label Contrastive Predictive Coding

Figure 4 for Multi-label Contrastive Predictive Coding

Abstract:Variational mutual information (MI) estimators are widely used in unsupervised representation learning methods such as contrastive predictive coding (CPC). A lower bound on MI can be obtained from a multi-class classification problem, where a critic attempts to distinguish a positive sample drawn from the underlying joint distribution from $(m-1)$ negative samples drawn from a suitable proposal distribution. Using this approach, MI estimates are bounded above by $\log m$, and could thus severely underestimate unless $m$ is very large. To overcome this limitation, we introduce a novel estimator based on a multi-label classification problem, where the critic needs to jointly identify multiple positive samples at the same time. We show that using the same amount of negative samples, multi-label CPC is able to exceed the $\log m$ bound, while still being a valid lower bound of mutual information. We demonstrate that the proposed approach is able to lead to better mutual information estimation, gain empirical improvements in unsupervised representation learning, and beat a current state-of-the-art knowledge distillation method over 10 out of 13 tasks.

Via

Access Paper or Ask Questions

Efficient Learning of Generative Models via Finite-Difference Score Matching

Jul 07, 2020

Tianyu Pang, Kun Xu, Chongxuan Li, Yang Song, Stefano Ermon, Jun Zhu

Figure 1 for Efficient Learning of Generative Models via Finite-Difference Score Matching

Figure 2 for Efficient Learning of Generative Models via Finite-Difference Score Matching

Figure 3 for Efficient Learning of Generative Models via Finite-Difference Score Matching

Figure 4 for Efficient Learning of Generative Models via Finite-Difference Score Matching

Abstract:Several machine learning applications involve the optimization of higher-order derivatives (e.g., gradients of gradients) during training, which can be expensive in respect to memory and computation even with automatic differentiation. As a typical example in generative modeling, score matching (SM) involves the optimization of the trace of a Hessian. To improve computing efficiency, we rewrite the SM objective and its variants in terms of directional derivatives, and present a generic strategy to efficiently approximate any-order directional derivative with finite difference (FD). Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations. Thus, it reduces the total computational cost while also improving numerical stability. We provide two instantiations by reformulating variants of SM objectives into the FD forms. Empirically, we demonstrate that our methods produce results comparable to the gradient-based counterparts while being much more computationally efficient.

Via

Access Paper or Ask Questions

Individual Calibration with Randomized Forecasting

Jul 05, 2020

Shengjia Zhao, Tengyu Ma, Stefano Ermon

Figure 1 for Individual Calibration with Randomized Forecasting

Figure 2 for Individual Calibration with Randomized Forecasting

Figure 3 for Individual Calibration with Randomized Forecasting

Figure 4 for Individual Calibration with Randomized Forecasting

Abstract:Machine learning applications often require calibrated predictions, e.g. a 90\% credible interval should contain the true outcome 90\% of the times. However, typical definitions of calibration only require this to hold on average, and offer no guarantees on predictions made on individual samples. Thus, predictions can be systematically over or under confident on certain subgroups, leading to issues of fairness and potential vulnerabilities. We show that calibration for individual samples is possible in the regression setup if the predictions are randomized, i.e. outputting randomized credible intervals. Randomization removes systematic bias by trading off bias with variance. We design a training objective to enforce individual calibration and use it to train randomized regression functions. The resulting models are more calibrated for arbitrarily chosen subgroups of the data, and can achieve higher utility in decision making against adversaries that exploit miscalibrated predictions.

Via

Access Paper or Ask Questions

Belief Propagation Neural Networks

Jul 01, 2020

Jonathan Kuck, Shuvam Chakraborty, Hao Tang, Rachel Luo, Jiaming Song, Ashish Sabharwal, Stefano Ermon

Figure 1 for Belief Propagation Neural Networks

Figure 2 for Belief Propagation Neural Networks

Figure 3 for Belief Propagation Neural Networks

Figure 4 for Belief Propagation Neural Networks

Abstract:Learned neural solvers have successfully been used to solve combinatorial optimization and decision problems. More general counting variants of these problems, however, are still largely solved with hand-crafted solvers. To bridge this gap, we introduce belief propagation neural networks (BPNNs), a class of parameterized operators that operate on factor graphs and generalize Belief Propagation (BP). In its strictest form, a BPNN layer (BPNN-D) is a learned iterative operator that provably maintains many of the desirable properties of BP for any choice of the parameters. Empirically, we show that by training BPNN-D learns to perform the task better than the original BP: it converges 1.7x faster on Ising models while providing tighter bounds. On challenging model counting problems, BPNNs compute estimates 100's of times faster than state-of-the-art handcrafted methods, while returning an estimate of comparable quality.

Via

Access Paper or Ask Questions