Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lorenzo Rosasco

Generalization Properties of Learning with Random Features

Jan 31, 2018
Alessandro Rudi, Lorenzo Rosasco

Figure 1 for Generalization Properties of Learning with Random Features

Figure 2 for Generalization Properties of Learning with Random Features

Figure 3 for Generalization Properties of Learning with Random Features

Figure 4 for Generalization Properties of Learning with Random Features

We study the generalization properties of ridge regression with random features in the statistical learning framework. We show for the first time that $O(1/\sqrt{n})$ learning bounds can be achieved with only $O(\sqrt{n}\log n)$ random features rather than $O({n})$ as suggested by previous results. Further, we prove faster learning rates and show that they might require more random features, unless they are sampled according to a possibly problem dependent distribution. Our results shed light on the statistical computational trade-offs in large scale kernelized learning, showing the potential effectiveness of random features in reducing the computational complexity while keeping optimal generalization properties.

* NIPS 2017

Via

Access Paper or Ask Questions

Theory of Deep Learning III: explaining the non-overfitting puzzle

Jan 16, 2018
Tomaso Poggio, Kenji Kawaguchi, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Xavier Boix, Jack Hidary, Hrushikesh Mhaskar

Figure 1 for Theory of Deep Learning III: explaining the non-overfitting puzzle

Figure 2 for Theory of Deep Learning III: explaining the non-overfitting puzzle

Figure 3 for Theory of Deep Learning III: explaining the non-overfitting puzzle

Figure 4 for Theory of Deep Learning III: explaining the non-overfitting puzzle

A main puzzle of deep networks revolves around the absence of overfitting despite large overparametrization and despite the large capacity demonstrated by zero training error on randomly labeled data. In this note, we show that the dynamics associated to gradient descent minimization of nonlinear networks is topologically equivalent, near the asymptotically stable minima of the empirical error, to linear gradient system in a quadratic potential with a degenerate (for square loss) or almost degenerate (for logistic or crossentropy loss) Hessian. The proposition depends on the qualitative theory of dynamical systems and is supported by numerical results. Our main propositions extend to deep nonlinear networks two properties of gradient descent for linear networks, that have been recently established (1) to be key to their generalization properties: 1. Gradient descent enforces a form of implicit regularization controlled by the number of iterations, and asymptotically converges to the minimum norm solution for appropriate initial conditions of gradient descent. This implies that there is usually an optimum early stopping that avoids overfitting of the loss. This property, valid for the square loss and many other loss functions, is relevant especially for regression. 2. For classification, the asymptotic convergence to the minimum norm solution implies convergence to the maximum margin solution which guarantees good classification error for "low noise" datasets. This property holds for loss functions such as the logistic and cross-entropy loss independently of the initial conditions. The robustness to overparametrization has suggestive implications for the robustness of the architecture of deep convolutional networks with respect to the curse of dimensionality.

Via

Access Paper or Ask Questions

Optimal Rates for Multi-pass Stochastic Gradient Methods

Oct 21, 2017
Junhong Lin, Lorenzo Rosasco

Figure 1 for Optimal Rates for Multi-pass Stochastic Gradient Methods

Figure 2 for Optimal Rates for Multi-pass Stochastic Gradient Methods

We analyze the learning properties of the stochastic gradient method when multiple passes over the data and mini-batches are allowed. We study how regularization properties are controlled by the step-size, the number of passes and the mini-batch size. In particular, we consider the square loss and show that for a universal step-size choice, the number of passes acts as a regularization parameter, and optimal finite sample bounds can be achieved by early-stopping. Moreover, we show that larger step-sizes are allowed when considering mini-batches. Our analysis is based on a unifying approach, encompassing both batch and stochastic gradient methods as special cases. As a byproduct, we derive optimal convergence results for batch gradient methods (even in the non-attainable cases).

* Extended versions of the previous one. Fixed some typos, JMLR, 2017

Via

Access Paper or Ask Questions

Optimal Rates for Learning with Nyström Stochastic Gradient Methods

Oct 21, 2017
Junhong Lin, Lorenzo Rosasco

Figure 1 for Optimal Rates for Learning with Nyström Stochastic Gradient Methods

Figure 2 for Optimal Rates for Learning with Nyström Stochastic Gradient Methods

In the setting of nonparametric regression, we propose and study a combination of stochastic gradient methods with Nystr\"om subsampling, allowing multiple passes over the data and mini-batches. Generalization error bounds for the studied algorithm are provided. Particularly, optimal learning rates are derived considering different possible choices of the step-size, the mini-batch size, the number of iterations/passes, and the subsampling level. In comparison with state-of-the-art algorithms such as the classic stochastic gradient methods and kernel ridge regression with Nystr\"om, the studied algorithm has advantages on the computational complexity, while achieving the same optimal learning rates. Moreover, our results indicate that using mini-batches can reduce the total computational cost while achieving the same optimal statistical results.

* 41pages, 6figures

Via

Access Paper or Ask Questions

Solving $\ell^p\!$-norm regularization with tensor kernels

Oct 18, 2017
Saverio Salzo, Johan A. K. Suykens, Lorenzo Rosasco

$Figure 1 for Solving $\ell^p\!$-norm regularization with tensor kernels$

$Figure 2 for Solving $\ell^p\!$-norm regularization with tensor kernels$

$Figure 3 for Solving $\ell^p\!$-norm regularization with tensor kernels$

In this paper, we discuss how a suitable family of tensor kernels can be used to efficiently solve nonparametric extensions of $\ell^p$ regularized learning methods. Our main contribution is proposing a fast dual algorithm, and showing that it allows to solve the problem efficiently. Our results contrast recent findings suggesting kernel methods cannot be extended beyond Hilbert setting. Numerical experiments confirm the effectiveness of the method.

Via

Access Paper or Ask Questions

Are we Done with Object Recognition? The iCub robot's Perspective

Sep 28, 2017
Giulia Pasquale, Carlo Ciliberto, Francesca Odone, Lorenzo Rosasco, Lorenzo Natale

Figure 1 for Are we Done with Object Recognition? The iCub robot's Perspective

Figure 2 for Are we Done with Object Recognition? The iCub robot's Perspective

Figure 3 for Are we Done with Object Recognition? The iCub robot's Perspective

Figure 4 for Are we Done with Object Recognition? The iCub robot's Perspective

We report on an extensive study of the current benefits and limitations of deep learning approaches to robot vision and introduce a novel dataset used for our investigation. To avoid the biases in currently available datasets, we consider a human-robot interaction setting to design a data-acquisition protocol for visual object recognition on the iCub humanoid robot. Considering the performance of off-the-shelf models trained on off-line large-scale image retrieval datasets, we show the necessity for knowledge transfer. Indeed, we analyze different ways in which this last step can be done, and identify the major bottlenecks in robotics scenarios. By studying both object categorization and identification tasks, we highlight the key differences between object recognition in robotics and in image retrieval tasks, for which the considered deep learning approaches have been originally designed. In a nutshell, our results confirm also in the considered setting the remarkable improvements yield by deep learning, while pointing to specific open challenges that need to be addressed for seamless deployment in robotics.

* 21 pages + supplementary material

Via

Access Paper or Ask Questions

Consistent Multitask Learning with Nonlinear Output Relations

Aug 10, 2017
Carlo Ciliberto, Alessandro Rudi, Lorenzo Rosasco, Massimiliano Pontil

Figure 1 for Consistent Multitask Learning with Nonlinear Output Relations

Figure 2 for Consistent Multitask Learning with Nonlinear Output Relations

Figure 3 for Consistent Multitask Learning with Nonlinear Output Relations

Key to multitask learning is exploiting relationships between different tasks to improve prediction performance. If the relations are linear, regularization approaches can be used successfully. However, in practice assuming the tasks to be linearly related might be restrictive, and allowing for nonlinear structures is a challenge. In this paper, we tackle this issue by casting the problem within the framework of structured prediction. Our main contribution is a novel algorithm for learning multiple tasks which are related by a system of nonlinear equations that their joint outputs need to satisfy. We show that the algorithm is consistent and can be efficiently implemented. Experimental results show the potential of the proposed method.

* 25 pages, 1 figure, 2 tables

Via

Access Paper or Ask Questions

Convergence of the Forward-Backward Algorithm: Beyond the Worst Case with the Help of Geometry

Aug 01, 2017
Guillaume Garrigos, Lorenzo Rosasco, Silvia Villa

Figure 1 for Convergence of the Forward-Backward Algorithm: Beyond the Worst Case with the Help of Geometry

We provide a comprehensive study of the convergence of forward-backward algorithm under suitable geometric conditions leading to fast rates. We present several new results and collect in a unified view a variety of results scattered in the literature, often providing simplified proofs. Novel contributions include the analysis of infinite dimensional convex minimization problems, allowing the case where minimizers might not exist. Further, we analyze the relation between different geometric conditions, and discuss novel connections with a priori conditions in linear inverse problems, including source conditions, restricted isometry properties and partial smoothness.

Via

Access Paper or Ask Questions

A Consistent Regularization Approach for Structured Prediction

Jul 28, 2017
Carlo Ciliberto, Alessandro Rudi, Lorenzo Rosasco

Figure 1 for A Consistent Regularization Approach for Structured Prediction

Figure 2 for A Consistent Regularization Approach for Structured Prediction

Figure 3 for A Consistent Regularization Approach for Structured Prediction

We propose and analyze a regularization approach for structured prediction problems. We characterize a large class of loss functions that allows to naturally embed structured outputs in a linear space. We exploit this fact to design learning algorithms using a surrogate loss approach and regularization techniques. We prove universal consistency and finite sample bounds characterizing the generalization properties of the proposed methods. Experimental results are provided to demonstrate the practical usefulness of the proposed approach.

* 39 pages, 2 Tables, 1 Figure

Via

Access Paper or Ask Questions

Don't relax: early stopping for convex regularization

Jul 18, 2017
Simon Matet, Lorenzo Rosasco, Silvia Villa, Bang Long Vu

Figure 1 for Don't relax: early stopping for convex regularization

Figure 2 for Don't relax: early stopping for convex regularization

Figure 3 for Don't relax: early stopping for convex regularization

We consider the problem of designing efficient regularization algorithms when regularization is encoded by a (strongly) convex functional. Unlike classical penalization methods based on a relaxation approach, we propose an iterative method where regularization is achieved via early stopping. Our results show that the proposed procedure achieves the same recovery accuracy as penalization methods, while naturally integrating computational considerations. An empirical analysis on a number of problems provides promising results with respect to the state of the art.

Via

Access Paper or Ask Questions