Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Behnam Neyshabur

Shammie

Corralling a Band of Bandit Algorithms

Jun 06, 2017

Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, Robert E. Schapire

Figure 1 for Corralling a Band of Bandit Algorithms

Abstract:We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the best base algorithm if it were to be run on its own. The main challenge is that when run with a master, base algorithms unavoidably receive much less feedback and it is thus critical that the master not starve a base algorithm that might perform uncompetitively initially but would eventually outperform others if given enough feedback. We address this difficulty by devising a version of Online Mirror Descent with a special mirror map together with a sophisticated learning rate scheme. We show that this approach manages to achieve a more delicate balance between exploiting and exploring base algorithms than previous works yielding superior regret bounds. Our results are applicable to many settings, such as multi-armed bandits, contextual bandits, and convex bandits. As examples, we present two main applications. The first is to create an algorithm that enjoys worst-case robustness while at the same time performing much better when the environment is relatively easy. The second is to create an algorithm that works simultaneously under different assumptions of the environment, such as different priors or different loss structures.

* Accepted to COLT 2017

Via

Access Paper or Ask Questions

Implicit Regularization in Matrix Factorization

May 25, 2017

Suriya Gunasekar, Blake Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro

Figure 1 for Implicit Regularization in Matrix Factorization

Figure 2 for Implicit Regularization in Matrix Factorization

Figure 3 for Implicit Regularization in Matrix Factorization

Figure 4 for Implicit Regularization in Matrix Factorization

Abstract:We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix $X$ with gradient descent on a factorization of $X$. We conjecture and provide empirical and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent on a full dimensional factorization converges to the minimum nuclear norm solution.

Via

Access Paper or Ask Questions

Geometry of Optimization and Implicit Regularization in Deep Learning

May 08, 2017

Behnam Neyshabur, Ryota Tomioka, Ruslan Salakhutdinov, Nathan Srebro

Figure 1 for Geometry of Optimization and Implicit Regularization in Deep Learning

Figure 2 for Geometry of Optimization and Implicit Regularization in Deep Learning

Figure 3 for Geometry of Optimization and Implicit Regularization in Deep Learning

Figure 4 for Geometry of Optimization and Implicit Regularization in Deep Learning

Abstract:We argue that the optimization plays a crucial role in generalization of deep learning models through implicit regularization. We do this by demonstrating that generalization ability is not controlled by network size but rather by some other implicit control. We then demonstrate how changing the empirical optimization procedure can improve generalization, even if actual optimization quality is not affected. We do so by studying the geometry of the parameter space of deep networks, and devising an optimization algorithm attuned to this geometry.

* This survey chapter was done as a part of Intel Collaborative Research institute for Computational Intelligence (ICRI-CI) "Why & When Deep Learning works -- looking inside Deep Learning" compendium with the generous support of ICRI-CI. arXiv admin note: substantial text overlap with arXiv:1506.02617

Via

Access Paper or Ask Questions

Global Optimality of Local Search for Low Rank Matrix Recovery

May 27, 2016

Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro

Figure 1 for Global Optimality of Local Search for Low Rank Matrix Recovery

Figure 2 for Global Optimality of Local Search for Low Rank Matrix Recovery

Figure 3 for Global Optimality of Local Search for Low Rank Matrix Recovery

Abstract:We show that there are no spurious local minima in the non-convex factorized parametrization of low-rank matrix recovery from incoherent linear measurements. With noisy measurements we show all local minima are very close to a global optimum. Together with a curvature bound at saddle points, this yields a polynomial time global convergence guarantee for stochastic gradient descent {\em from random initialization}.

* 21 pages, 3 figures

Via

Access Paper or Ask Questions

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

May 23, 2016

Behnam Neyshabur, Yuhuai Wu, Ruslan Salakhutdinov, Nathan Srebro

Figure 1 for Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Figure 2 for Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Figure 3 for Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Figure 4 for Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Abstract:We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require capturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD, even with various recently suggested initialization schemes.

* 15 pages

Via

Access Paper or Ask Questions

Data-Dependent Path Normalization in Neural Networks

Jan 19, 2016

Behnam Neyshabur, Ryota Tomioka, Ruslan Salakhutdinov, Nathan Srebro

Figure 1 for Data-Dependent Path Normalization in Neural Networks

Figure 2 for Data-Dependent Path Normalization in Neural Networks

Abstract:We propose a unified framework for neural net normalization, regularization and optimization, which includes Path-SGD and Batch-Normalization and interpolates between them across two different dimensions. Through this framework we investigate issue of invariance of the optimization, data dependence and the connection with natural gradients.

* 17 pages, 3 figures

Via

Access Paper or Ask Questions

On Symmetric and Asymmetric LSHs for Inner Product Search

Jun 08, 2015

Behnam Neyshabur, Nathan Srebro

Figure 1 for On Symmetric and Asymmetric LSHs for Inner Product Search

Figure 2 for On Symmetric and Asymmetric LSHs for Inner Product Search

Figure 3 for On Symmetric and Asymmetric LSHs for Inner Product Search

Abstract:We consider the problem of designing locality sensitive hashes (LSH) for inner product similarity, and of the power of asymmetric hashes in this context. Shrivastava and Li argue that there is no symmetric LSH for the problem and propose an asymmetric LSH based on different mappings for query and database points. However, we show there does exist a simple symmetric LSH that enjoys stronger guarantees and better empirical performance than the asymmetric LSH they suggest. We also show a variant of the settings where asymmetry is in-fact needed, but there a different asymmetric LSH is required.

* 11 pages, 3 figures, In Proceedings of The 32nd International Conference on Machine Learning (ICML)

Via

Access Paper or Ask Questions

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Jun 08, 2015

Behnam Neyshabur, Ruslan Salakhutdinov, Nathan Srebro

Figure 1 for Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Figure 2 for Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Figure 3 for Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Figure 4 for Path-SGD: Path-Normalized Optimization in Deep Neural Networks

Abstract:We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights. We argue for a geometry invariant to rescaling of weights that does not affect the output of the network, and suggest Path-SGD, which is an approximate steepest descent method with respect to a path-wise regularizer related to max-norm regularization. Path-SGD is easy and efficient to implement and leads to empirical gains over SGD and AdaGrad.

* 12 pages, 5 figures

Via

Access Paper or Ask Questions

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning

Apr 16, 2015

Behnam Neyshabur, Ryota Tomioka, Nathan Srebro

Figure 1 for In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning

Figure 2 for In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning

Abstract:We present experiments demonstrating that some other form of capacity control, different from network size, plays a central role in learning multilayer feed-forward networks. We argue, partially through analogy to matrix factorization, that this is an inductive bias that can help shed light on deep learning.

* 9 pages, 2 figures

Via

Access Paper or Ask Questions

Norm-Based Capacity Control in Neural Networks

Apr 14, 2015

Behnam Neyshabur, Ryota Tomioka, Nathan Srebro

Abstract:We investigate the capacity, convexity and characterization of a general family of norm-constrained feed-forward networks.

* 29 pages

Via

Access Paper or Ask Questions