Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael I. Jordan

Improved Oracle Complexity for Stochastic Compositional Variance Reduced Gradient

Jun 01, 2018
Tianyi Lin, Chenyou Fan, Mengdi Wang, Michael I. Jordan

Figure 1 for Improved Oracle Complexity for Stochastic Compositional Variance Reduced Gradient

Figure 2 for Improved Oracle Complexity for Stochastic Compositional Variance Reduced Gradient

Figure 3 for Improved Oracle Complexity for Stochastic Compositional Variance Reduced Gradient

Figure 4 for Improved Oracle Complexity for Stochastic Compositional Variance Reduced Gradient

We propose an accelerated stochastic compositional variance reduced gradient method for optimizing the sum of a composition function and a convex nonsmooth function. We provide an \textit{incremental first-order oracle} (IFO) complexity analysis for the proposed algorithm and show that it is provably faster than all the existing methods. Indeed, we show that our method achieves an asymptotic IFO complexity of $O\left((m+n)\log\left(1/\varepsilon\right)+1/\varepsilon^3\right)$ where $m$ and $n$ are the number of inner/outer component functions, improving the best-known results of $O\left(m+n+(m+n)^{2/3}/\varepsilon^2\right)$ and achieving for \textit{the best known linear run time} for convex composition problem. Experiment results on sparse mean-variance optimization with 21 real-world financial datasets confirm that our method outperforms other competing methods.

* arXiv admin note: text overlap with arXiv: 1802.02339; correct typos, improve proof and add experiments on real datasets

Via

Access Paper or Ask Questions

Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data

May 31, 2018
Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane-Ling Wang, Michael I. Jordan

Figure 1 for Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data

Figure 2 for Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data

Figure 3 for Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data

Figure 4 for Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data

We present a probabilistic framework for studying adversarial attacks on discrete data. Based on this framework, we derive a perturbation-based method, Greedy Attack, and a scalable learning-based method, Gumbel Attack, that illustrate various tradeoffs in the design of attacks. We demonstrate the effectiveness of these methods using both quantitative metrics and human evaluation on various state-of-the-art models for text classification, including a word-based CNN, a character-based CNN and an LSTM. As as example of our results, we show that the accuracy of character-based convolutional networks drops to the level of random selection by modifying only five characters through Greedy Attack.

* The first two authors contributed equally

Via

Access Paper or Ask Questions

Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification

May 24, 2018
Max Simchowitz, Horia Mania, Stephen Tu, Michael I. Jordan, Benjamin Recht

We prove that the ordinary least-squares (OLS) estimator attains nearly minimax optimal performance for the identification of linear dynamical systems from a single observed trajectory. Our upper bound relies on a generalization of Mendelson's small-ball method to dependent data, eschewing the use of standard mixing-time arguments. Our lower bounds reveal that these upper bounds match up to logarithmic factors. In particular, we capture the correct signal-to-noise behavior of the problem, showing that more unstable linear systems are easier to estimate. This behavior is qualitatively different from arguments which rely on mixing-time calculations that suggest that unstable systems are more difficult to estimate. We generalize our technique to provide bounds for a more general class of linear response time-series.

Via

Access Paper or Ask Questions

On Nonlinear Dimensionality Reduction, Linear Smoothing and Autoencoding

Mar 06, 2018
Daniel Ting, Michael I. Jordan

Figure 1 for On Nonlinear Dimensionality Reduction, Linear Smoothing and Autoencoding

Figure 2 for On Nonlinear Dimensionality Reduction, Linear Smoothing and Autoencoding

Figure 3 for On Nonlinear Dimensionality Reduction, Linear Smoothing and Autoencoding

Figure 4 for On Nonlinear Dimensionality Reduction, Linear Smoothing and Autoencoding

We develop theory for nonlinear dimensionality reduction (NLDR). A number of NLDR methods have been developed, but there is limited understanding of how these methods work and the relationships between them. There is limited basis for using existing NLDR theory for deriving new algorithms. We provide a novel framework for analysis of NLDR via a connection to the statistical theory of linear smoothers. This allows us to both understand existing methods and derive new ones. We use this connection to smoothing to show that asymptotically, existing NLDR methods correspond to discrete approximations of the solutions of sets of differential equations given a boundary condition. In particular, we can characterize many existing methods in terms of just three limiting differential operators and boundary conditions. Our theory also provides a way to assert that one method is preferable to another; indeed, we show Local Tangent Space Alignment is superior within a class of methods that assume a global coordinate chart defines an isometric embedding of the manifold.

Via

Access Paper or Ask Questions

Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning

Feb 28, 2018
Vladimir Feinberg, Alvin Wan, Ion Stoica, Michael I. Jordan, Joseph E. Gonzalez, Sergey Levine

Figure 1 for Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning

Figure 2 for Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning

Figure 3 for Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning

Figure 4 for Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning

Recent model-free reinforcement learning algorithms have proposed incorporating learned dynamics models as a source of additional data with the intention of reducing sample complexity. Such methods hold the promise of incorporating imagined data coupled with a notion of model uncertainty to accelerate the learning of continuous control tasks. Unfortunately, they rely on heuristics that limit usage of the dynamics model. We present model-based value expansion, which controls for uncertainty in the model by only allowing imagination to fixed depth. By enabling wider use of learned dynamics models within a model-free reinforcement learning algorithm, we improve value estimation, which, in turn, reduces the sample complexity of learning.

Via

Access Paper or Ask Questions

On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo

Feb 15, 2018
Niladri S. Chatterji, Nicolas Flammarion, Yi-An Ma, Peter L. Bartlett, Michael I. Jordan

Figure 1 for On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo

Figure 2 for On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo

Figure 3 for On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo

We provide convergence guarantees in Wasserstein distance for a variety of variance-reduction methods: SAGA Langevin diffusion, SVRG Langevin diffusion and control-variate underdamped Langevin diffusion. We analyze these methods under a uniform set of assumptions on the log-posterior distribution, assuming it to be smooth, strongly convex and Hessian Lipschitz. This is achieved by a new proof technique combining ideas from finite-sum optimization and the analysis of sampling methods. Our sharp theoretical bounds allow us to identify regimes of interest where each method performs better than the others. Our theory is verified with experiments on real-world and synthetic datasets.

* 37 pages; 4 figures

Via

Access Paper or Ask Questions

Conditional Adversarial Domain Adaptation

Feb 10, 2018
Mingsheng Long, Zhangjie Cao, Jianmin Wang, Michael I. Jordan

Figure 1 for Conditional Adversarial Domain Adaptation

Figure 2 for Conditional Adversarial Domain Adaptation

Figure 3 for Conditional Adversarial Domain Adaptation

Figure 4 for Conditional Adversarial Domain Adaptation

Adversarial learning has been embedded into deep networks to learn transferable representations for domain adaptation. Existing adversarial domain adaptation methods may struggle to align different domains of multimode distributions that are native in classification problems. In this paper, we present conditional adversarial domain adaptation, a novel framework that conditions the adversarial adaptation models on discriminative information conveyed in the classifier predictions. Conditional domain adversarial networks are proposed to enable discriminative adversarial adaptation of multimode domains. The experiments testify that the proposed approaches exceed the state-of-the-art performance on three domain adaptation datasets.

* arXiv admin note: text overlap with arXiv:1605.06636

Via

Access Paper or Ask Questions

Underdamped Langevin MCMC: A non-asymptotic analysis

Jan 26, 2018
Xiang Cheng, Niladri S. Chatterji, Peter L. Bartlett, Michael I. Jordan

We study the underdamped Langevin diffusion when the log of the target distribution is smooth and strongly concave. We present a MCMC algorithm based on its discretization and show that it achieves $\varepsilon$ error (in 2-Wasserstein distance) in $\mathcal{O}(\sqrt{d}/\varepsilon)$ steps. This is a significant improvement over the best known rate for overdamped Langevin MCMC, which is $\mathcal{O}(d/\varepsilon^2)$ steps under the same smoothness/concavity assumptions. The underdamped Langevin MCMC scheme can be viewed as a version of Hamiltonian Monte Carlo (HMC) which has been observed to outperform overdamped Langevin MCMC methods in a number of application areas. We provide quantitative rates that support this empirical wisdom.

* 23 pages; Correction to Corollary 7

Via

Access Paper or Ask Questions

Stochastic Cubic Regularization for Fast Nonconvex Optimization

Dec 05, 2017
Nilesh Tripuraneni, Mitchell Stern, Chi Jin, Jeffrey Regier, Michael I. Jordan

Figure 1 for Stochastic Cubic Regularization for Fast Nonconvex Optimization

Figure 2 for Stochastic Cubic Regularization for Fast Nonconvex Optimization

Figure 3 for Stochastic Cubic Regularization for Fast Nonconvex Optimization

Figure 4 for Stochastic Cubic Regularization for Fast Nonconvex Optimization

This paper proposes a stochastic variant of a classic algorithm---the cubic-regularized Newton method [Nesterov and Polyak 2006]. The proposed algorithm efficiently escapes saddle points and finds approximate local minima for general smooth, nonconvex functions in only $\mathcal{\tilde{O}}(\epsilon^{-3.5})$ stochastic gradient and stochastic Hessian-vector product evaluations. The latter can be computed as efficiently as stochastic gradients. This improves upon the $\mathcal{\tilde{O}}(\epsilon^{-4})$ rate of stochastic gradient descent. Our rate matches the best-known result for finding local minima without requiring any delicate acceleration or variance-reduction techniques.

* The first two authors contributed equally

Via

Access Paper or Ask Questions

Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent

Nov 28, 2017
Chi Jin, Praneeth Netrapalli, Michael I. Jordan

Figure 1 for Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent

Nesterov's accelerated gradient descent (AGD), an instance of the general family of "momentum methods", provably achieves faster convergence rate than gradient descent (GD) in the convex setting. However, whether these methods are superior to GD in the nonconvex setting remains open. This paper studies a simple variant of AGD, and shows that it escapes saddle points and finds a second-order stationary point in $\tilde{O}(1/\epsilon^{7/4})$ iterations, faster than the $\tilde{O}(1/\epsilon^{2})$ iterations required by GD. To the best of our knowledge, this is the first Hessian-free algorithm to find a second-order stationary point faster than GD, and also the first single-loop algorithm with a faster rate than GD even in the setting of finding a first-order stationary point. Our analysis is based on two key ideas: (1) the use of a simple Hamiltonian function, inspired by a continuous-time perspective, which AGD monotonically decreases per step even for nonconvex functions, and (2) a novel framework called improve or localize, which is useful for tracking the long-term behavior of gradient-based optimization algorithms. We believe that these techniques may deepen our understanding of both acceleration algorithms and nonconvex optimization.

Via

Access Paper or Ask Questions