Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stefanie Jegelka

Structured Optimal Transport

Dec 17, 2017
David Alvarez-Melis, Tommi S. Jaakkola, Stefanie Jegelka

Figure 1 for Structured Optimal Transport

Figure 2 for Structured Optimal Transport

Figure 3 for Structured Optimal Transport

Figure 4 for Structured Optimal Transport

Optimal Transport has recently gained interest in machine learning for applications ranging from domain adaptation, sentence similarities to deep learning. Yet, its ability to capture frequently occurring structure beyond the "ground metric" is limited. In this work, we develop a nonlinear generalization of (discrete) optimal transport that is able to reflect much additional structure. We demonstrate how to leverage the geometry of this new model for fast algorithms, and explore connections and properties. Illustrative experiments highlight the benefit of the induced structured couplings for tasks in domain adaptation and natural language processing.

Via

Access Paper or Ask Questions

Graph-Sparse Logistic Regression

Dec 15, 2017
Alexander LeNail, Ludwig Schmidt, Johnathan Li, Tobias Ehrenberger, Karen Sachs, Stefanie Jegelka, Ernest Fraenkel

Figure 1 for Graph-Sparse Logistic Regression

Figure 2 for Graph-Sparse Logistic Regression

We introduce Graph-Sparse Logistic Regression, a new algorithm for classification for the case in which the support should be sparse but connected on a graph. We val- idate this algorithm against synthetic data and benchmark it against L1-regularized Logistic Regression. We then explore our technique in the bioinformatics context of proteomics data on the interactome graph. We make all our experimental code public and provide GSLR as an open source package.

* 7 pages, 2 figures, NIPS DISCML workshop paper

Via

Access Paper or Ask Questions

Polynomial Time Algorithms for Dual Volume Sampling

Nov 16, 2017
Chengtao Li, Stefanie Jegelka, Suvrit Sra

Figure 1 for Polynomial Time Algorithms for Dual Volume Sampling

Figure 2 for Polynomial Time Algorithms for Dual Volume Sampling

Figure 3 for Polynomial Time Algorithms for Dual Volume Sampling

Figure 4 for Polynomial Time Algorithms for Dual Volume Sampling

We study dual volume sampling, a method for selecting k columns from an n x m short and wide matrix (n <= k <= m) such that the probability of selection is proportional to the volume spanned by the rows of the induced submatrix. This method was proposed by Avron and Boutsidis (2013), who showed it to be a promising method for column subset selection and its multiple applications. However, its wider adoption has been hampered by the lack of polynomial time sampling algorithms. We remove this hindrance by developing an exact (randomized) polynomial time sampling algorithm as well as its derandomization. Thereafter, we study dual volume sampling via the theory of real stable polynomials and prove that its distribution satisfies the "Strong Rayleigh" property. This result has numerous consequences, including a provably fast-mixing Markov chain sampler that makes dual volume sampling much more attractive to practitioners. This sampler is closely related to classical algorithms for popular experimental design methods that are to date lacking theoretical analysis but are known to empirically work well.

Via

Access Paper or Ask Questions

Parallel Streaming Wasserstein Barycenters

Nov 14, 2017
Matthew Staib, Sebastian Claici, Justin Solomon, Stefanie Jegelka

Figure 1 for Parallel Streaming Wasserstein Barycenters

Figure 2 for Parallel Streaming Wasserstein Barycenters

Figure 3 for Parallel Streaming Wasserstein Barycenters

Efficiently aggregating data from different sources is a challenging problem, particularly when samples from each source are distributed differently. These differences can be inherent to the inference task or present for other reasons: sensors in a sensor network may be placed far apart, affecting their individual measurements. Conversely, it is computationally advantageous to split Bayesian inference tasks across subsets of data, but data need not be identically distributed across subsets. One principled way to fuse probability distributions is via the lens of optimal transport: the Wasserstein barycenter is a single distribution that summarizes a collection of input measures while respecting their geometry. However, computing the barycenter scales poorly and requires discretization of all input distributions and the barycenter itself. Improving on this situation, we present a scalable, communication-efficient, parallel algorithm for computing the Wasserstein barycenter of arbitrary distributions. Our algorithm can operate directly on continuous input distributions and is optimized for streaming data. Our method is even robust to nonstationary input distributions and produces a barycenter estimate that tracks the input measures over time. The algorithm is semi-discrete, needing to discretize only the barycenter estimate. To the best of our knowledge, we also provide the first bounds on the quality of the approximate barycenter as the discretization becomes finer. Finally, we demonstrate the practical effectiveness of our method, both in tracking moving distributions on a sphere, as well as in a large-scale Bayesian inference task.

* NIPS 2017

Via

Access Paper or Ask Questions

Distributional Adversarial Networks

Jul 09, 2017
Chengtao Li, David Alvarez-Melis, Keyulu Xu, Stefanie Jegelka, Suvrit Sra

Figure 1 for Distributional Adversarial Networks

Figure 2 for Distributional Adversarial Networks

Figure 3 for Distributional Adversarial Networks

Figure 4 for Distributional Adversarial Networks

We propose a framework for adversarial training that relies on a sample rather than a single sample point as the fundamental unit of discrimination. Inspired by discrepancy measures and two-sample tests between probability distributions, we propose two such distributional adversaries that operate and predict on samples, and show how they can be easily implemented on top of existing models. Various experimental results show that generators trained with our distributional adversaries are much more stable and are remarkably less prone to mode collapse than traditional models trained with pointwise prediction discriminators. The application of our framework to domain adaptation also results in considerable improvement over recent state-of-the-art.

Via

Access Paper or Ask Questions

Robust Budget Allocation via Continuous Submodular Functions

Jun 13, 2017
Matthew Staib, Stefanie Jegelka

Figure 1 for Robust Budget Allocation via Continuous Submodular Functions

Figure 2 for Robust Budget Allocation via Continuous Submodular Functions

Figure 3 for Robust Budget Allocation via Continuous Submodular Functions

Figure 4 for Robust Budget Allocation via Continuous Submodular Functions

The optimal allocation of resources for maximizing influence, spread of information or coverage, has gained attention in the past years, in particular in machine learning and data mining. But in applications, the parameters of the problem are rarely known exactly, and using wrong parameters can lead to undesirable outcomes. We hence revisit a continuous version of the Budget Allocation or Bipartite Influence Maximization problem introduced by Alon et al. (2012) from a robust optimization perspective, where an adversary may choose the least favorable parameters within a confidence set. The resulting problem is a nonconvex-concave saddle point problem (or game). We show that this nonconvex problem can be solved exactly by leveraging connections to continuous submodular functions, and by solving a constrained submodular minimization problem. Although constrained submodular minimization is hard in general, here, we establish conditions under which such a problem can be solved to arbitrary precision $\epsilon$.

* ICML 2017

Via

Access Paper or Ask Questions

Deep Metric Learning via Facility Location

Apr 11, 2017
Hyun Oh Song, Stefanie Jegelka, Vivek Rathod, Kevin Murphy

Figure 1 for Deep Metric Learning via Facility Location

Figure 2 for Deep Metric Learning via Facility Location

Figure 3 for Deep Metric Learning via Facility Location

Figure 4 for Deep Metric Learning via Facility Location

Learning the representation and the similarity metric in an end-to-end fashion with deep networks have demonstrated outstanding results for clustering and retrieval. However, these recent approaches still suffer from the performance degradation stemming from the local metric training procedure which is unaware of the global structure of the embedding space. We propose a global metric learning scheme for optimizing the deep metric embedding with the learnable clustering function and the clustering metric (NMI) in a novel structured prediction framework. Our experiments on CUB200-2011, Cars196, and Stanford online products datasets show state of the art performance both on the clustering and retrieval tasks measured in the NMI and Recall@K evaluation metrics.

* Submission accepted at CVPR 2017

Via

Access Paper or Ask Questions

Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling

Jan 08, 2017
Chengtao Li, Stefanie Jegelka, Suvrit Sra

Figure 1 for Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling

Figure 2 for Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling

Figure 3 for Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling

We study probability measures induced by set functions with constraints. Such measures arise in a variety of real-world settings, where prior knowledge, resource limitations, or other pragmatic considerations impose constraints. We consider the task of rapidly sampling from such constrained measures, and develop fast Markov chain samplers for them. Our first main result is for MCMC sampling from Strongly Rayleigh (SR) measures, for which we present sharp polynomial bounds on the mixing time. As a corollary, this result yields a fast mixing sampler for Determinantal Point Processes (DPPs), yielding (to our knowledge) the first provably fast MCMC sampler for DPPs since their inception over four decades ago. Beyond SR measures, we develop MCMC samplers for probabilistic models with hard constraints and identify sufficient conditions under which their chains mix rapidly. We illustrate our claims by empirically verifying the dependence of mixing times on the key factors governing our theoretical bounds.

* The present version subsumes arXiv:1607.03559

Via

Access Paper or Ask Questions

Focused Model-Learning and Planning for Non-Gaussian Continuous State-Action Systems

Oct 23, 2016
Zi Wang, Stefanie Jegelka, Leslie Pack Kaelbling, Tomás Lozano-Pérez

Figure 1 for Focused Model-Learning and Planning for Non-Gaussian Continuous State-Action Systems

Figure 2 for Focused Model-Learning and Planning for Non-Gaussian Continuous State-Action Systems

Figure 3 for Focused Model-Learning and Planning for Non-Gaussian Continuous State-Action Systems

Figure 4 for Focused Model-Learning and Planning for Non-Gaussian Continuous State-Action Systems

We introduce a framework for model learning and planning in stochastic domains with continuous state and action spaces and non-Gaussian transition models. It is efficient because (1) local models are estimated only when the planner requires them; (2) the planner focuses on the most relevant states to the current planning problem; and (3) the planner focuses on the most informative and/or high-value actions. Our theoretical analysis shows the validity and asymptotic optimality of the proposed approach. Empirically, we demonstrate the effectiveness of our algorithm on a simulated multi-modal pushing problem.

Via

Access Paper or Ask Questions

Fast Sampling for Strongly Rayleigh Measures with Application to Determinantal Point Processes

Jul 13, 2016
Chengtao Li, Stefanie Jegelka, Suvrit Sra

Figure 1 for Fast Sampling for Strongly Rayleigh Measures with Application to Determinantal Point Processes

In this note we consider sampling from (non-homogeneous) strongly Rayleigh probability measures. As an important corollary, we obtain a fast mixing Markov Chain sampler for Determinantal Point Processes.

Via

Access Paper or Ask Questions