Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lorenzo Rosasco

Iterative regularization for low complexity regularizers

Feb 01, 2022

Cesare Molinari, Mathurin Massias, Lorenzo Rosasco, Silvia Villa

Figure 1 for Iterative regularization for low complexity regularizers

Figure 2 for Iterative regularization for low complexity regularizers

Figure 3 for Iterative regularization for low complexity regularizers

Figure 4 for Iterative regularization for low complexity regularizers

Abstract:Iterative regularization exploits the implicit bias of an optimization algorithm to regularize ill-posed problems. Constructing algorithms with such built-in regularization mechanisms is a classic challenge in inverse problems but also in modern machine learning, where it provides both a new perspective on algorithms analysis, and significant speed-ups compared to explicit regularization. In this work, we propose and study the first iterative regularization procedure able to handle biases described by non smooth and non strongly convex functionals, prominent in low-complexity regularization. Our approach is based on a primal-dual algorithm of which we analyze convergence and stability properties, even in the case where the original problem is unfeasible. The general results are illustrated considering the special case of sparse recovery with the $\ell_1$ penalty. Our theoretical results are complemented by experiments showing the computational benefits of our approach.

Via

Access Paper or Ask Questions

Nyström Kernel Mean Embeddings

Jan 31, 2022

Antoine Chatalic, Nicolas Schreuder, Alessandro Rudi, Lorenzo Rosasco

Figure 1 for Nyström Kernel Mean Embeddings

Figure 2 for Nyström Kernel Mean Embeddings

Abstract:Kernel mean embeddings are a powerful tool to represent probability distributions over arbitrary spaces as single points in a Hilbert space. Yet, the cost of computing and storing such embeddings prohibits their direct use in large-scale settings. We propose an efficient approximation procedure based on the Nystr\"om method, which exploits a small random subset of the dataset. Our main result is an upper bound on the approximation error of this procedure. It yields sufficient conditions on the subsample size to obtain the standard $n^{-1/2}$ rate while reducing computational costs. We discuss applications of this result for the approximation of the maximum mean discrepancy and quadrature rules, and illustrate our theoretical findings with numerical experiments.

* 8 pages

Via

Access Paper or Ask Questions

Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times

Jan 30, 2022

Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco

Figure 1 for Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times

Figure 2 for Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times

Figure 3 for Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times

Figure 4 for Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times

Abstract:Computing a Gaussian process (GP) posterior has a computational cost cubical in the number of historical points. A reformulation of the same GP posterior highlights that this complexity mainly depends on how many \emph{unique} historical points are considered. This can have important implication in active learning settings, where the set of historical points is constructed sequentially by the learner. We show that sequential black-box optimization based on GPs (GP-Opt) can be made efficient by sticking to a candidate solution for multiple evaluation steps and switch only when necessary. Limiting the number of switches also limits the number of unique points in the history of the GP. Thus, the efficient GP reformulation can be used to exactly and cheaply compute the posteriors required to run the GP-Opt algorithms. This approach is especially useful in real-world applications of GP-Opt with high switch costs (e.g. switching chemicals in wet labs, data/model loading in hyperparameter optimization). As examples of this meta-approach, we modify two well-established GP-Opt algorithms, GP-UCB and GP-EI, to switch candidates as infrequently as possible adapting rules from batched GP-Opt. These versions preserve all the theoretical no-regret guarantees while improving practical aspects of the algorithms such as runtime, memory complexity, and the ability of batching candidates and evaluating them in parallel.

Via

Access Paper or Ask Questions

Efficient Hyperparameter Tuning for Large Scale Kernel Ridge Regression

Jan 17, 2022

Giacomo Meanti, Luigi Carratino, Ernesto De Vito, Lorenzo Rosasco

Figure 1 for Efficient Hyperparameter Tuning for Large Scale Kernel Ridge Regression

Figure 2 for Efficient Hyperparameter Tuning for Large Scale Kernel Ridge Regression

Figure 3 for Efficient Hyperparameter Tuning for Large Scale Kernel Ridge Regression

Figure 4 for Efficient Hyperparameter Tuning for Large Scale Kernel Ridge Regression

Abstract:Kernel methods provide a principled approach to nonparametric learning. While their basic implementations scale poorly to large problems, recent advances showed that approximate solvers can efficiently handle massive datasets. A shortcoming of these solutions is that hyperparameter tuning is not taken care of, and left for the user to perform. Hyperparameters are crucial in practice and the lack of automated tuning greatly hinders efficiency and usability. In this paper, we work to fill in this gap focusing on kernel ridge regression based on the Nystr\"om approximation. After reviewing and contrasting a number of hyperparameter tuning strategies, we propose a complexity regularization criterion based on a data dependent penalty, and discuss its efficient optimization. Then, we proceed to a careful and extensive empirical evaluation highlighting strengths and weaknesses of the different tuning strategies. Our analysis shows the benefit of the proposed approach, that we hence incorporate in a library for large scale kernel methods to derive adaptively tuned solutions.

* 24 pages, 3 figures

Via

Access Paper or Ask Questions

Mean Nyström Embeddings for Adaptive Compressive Learning

Oct 21, 2021

Antoine Chatalic, Luigi Carratino, Ernesto De Vito, Lorenzo Rosasco

Figure 1 for Mean Nyström Embeddings for Adaptive Compressive Learning

Figure 2 for Mean Nyström Embeddings for Adaptive Compressive Learning

Figure 3 for Mean Nyström Embeddings for Adaptive Compressive Learning

Figure 4 for Mean Nyström Embeddings for Adaptive Compressive Learning

Abstract:Compressive learning is an approach to efficient large scale learning based on sketching an entire dataset to a single mean embedding (the sketch), i.e. a vector of generalized moments. The learning task is then approximately solved as an inverse problem using an adapted parametric model. Previous works in this context have focused on sketches obtained by averaging random features, that while universal can be poorly adapted to the problem at hand. In this paper, we propose and study the idea of performing sketching based on data-dependent Nystr\"om approximation. From a theoretical perspective we prove that the excess risk can be controlled under a geometric assumption relating the parametric model used to learn from the sketch and the covariance operator associated to the task at hand. Empirically, we show for k-means clustering and Gaussian modeling that for a fixed sketch size, Nystr\"om sketches indeed outperform those built with random features.

* 22 pages, 4 figures

Via

Access Paper or Ask Questions

Understanding neural networks with reproducing kernel Banach spaces

Sep 20, 2021

Francesca Bartolucci, Ernesto De Vito, Lorenzo Rosasco, Stefano Vigogna

Figure 1 for Understanding neural networks with reproducing kernel Banach spaces

Figure 2 for Understanding neural networks with reproducing kernel Banach spaces

Figure 3 for Understanding neural networks with reproducing kernel Banach spaces

Figure 4 for Understanding neural networks with reproducing kernel Banach spaces

Abstract:Characterizing the function spaces corresponding to neural networks can provide a way to understand their properties. In this paper we discuss how the theory of reproducing kernel Banach spaces can be used to tackle this challenge. In particular, we prove a representer theorem for a wide class of reproducing kernel Banach spaces that admit a suitable integral representation and include one hidden layer neural networks of possibly infinite width. Further, we show that, for a suitable class of ReLU activation functions, the norm in the corresponding reproducing kernel Banach space can be characterized in terms of the inverse Radon transform of a bounded real measure, with norm given by the total variation norm of the measure. Our analysis simplifies and extends recent results in [34,29,30].

Via

Access Paper or Ask Questions

From inexact optimization to learning via gradient concentration

Jun 24, 2021

Bernhard Stankewitz, Nicole Mücke, Lorenzo Rosasco

Figure 1 for From inexact optimization to learning via gradient concentration

Abstract:Optimization was recently shown to control the inductive bias in a learning process, a property referred to as implicit, or iterative regularization. The estimator obtained iteratively minimizing the training error can generalise well with no need of further penalties or constraints. In this paper, we investigate this phenomenon in the context of linear models with smooth loss functions. In particular, we investigate and propose a proof technique combining ideas from inexact optimization and probability theory, specifically gradient concentration. The proof is easy to follow and allows to obtain sharp learning bounds. More generally, it highlights a way to develop optimization results into learning guarantees.

Via

Access Paper or Ask Questions

ParK: Sound and Efficient Kernel Ridge Regression by Feature Space Partitions

Jun 23, 2021

Luigi Carratino, Stefano Vigogna, Daniele Calandriello, Lorenzo Rosasco

Figure 1 for ParK: Sound and Efficient Kernel Ridge Regression by Feature Space Partitions

Abstract:We introduce ParK, a new large-scale solver for kernel ridge regression. Our approach combines partitioning with random projections and iterative optimization to reduce space and time complexity while provably maintaining the same statistical accuracy. In particular, constructing suitable partitions directly in the feature space rather than in the input space, we promote orthogonality between the local estimators, thus ensuring that key quantities such as local effective dimension and bias remain under control. We characterize the statistical-computational tradeoff of our model, and demonstrate the effectiveness of our method by numerical experiments on large-scale datasets.

Via

Access Paper or Ask Questions

Ada-BKB: Scalable Gaussian Process Optimization on Continuous Domain by Adaptive Discretization

Jun 16, 2021

Marco Rando, Luigi Carratino, Silvia Villa, Lorenzo Rosasco

Figure 1 for Ada-BKB: Scalable Gaussian Process Optimization on Continuous Domain by Adaptive Discretization

Figure 2 for Ada-BKB: Scalable Gaussian Process Optimization on Continuous Domain by Adaptive Discretization

Figure 3 for Ada-BKB: Scalable Gaussian Process Optimization on Continuous Domain by Adaptive Discretization

Figure 4 for Ada-BKB: Scalable Gaussian Process Optimization on Continuous Domain by Adaptive Discretization

Abstract:Gaussian process optimization is a successful class of algorithms (e.g. GP-UCB) to optimize a black-box function through sequential evaluations. However, when the domain of the function is continuous, Gaussian process optimization has to either rely on a fixed discretization of the space, or solve a non-convex optimization subproblem at each evaluation. The first approach can negatively affect performance, while the second one puts a heavy computational burden on the algorithm. A third option, that only recently has been theoretically studied, is to adaptively discretize the function domain. Even though this approach avoids the extra non-convex optimization costs, the overall computational complexity is still prohibitive. An algorithm such as GP-UCB has a runtime of $O(T^4)$, where $T$ is the number of iterations. In this paper, we introduce Ada-BKB (Adaptive Budgeted Kernelized Bandit), a no-regret Gaussian process optimization algorithm for functions on continuous domains, that provably runs in $O(T^2 d_\text{eff}^2)$, where $d_\text{eff}$ is the effective dimension of the explored space, and which is typically much smaller than $T$. We corroborate our findings with experiments on synthetic non-convex functions and on the real-world problem of hyper-parameter optimization.

Via

Access Paper or Ask Questions

On the Emergence of Whole-body Strategies from Humanoid Robot Push-recovery Learning

Apr 29, 2021

Diego Ferigo, Raffaello Camoriano, Paolo Maria Viceconte, Daniele Calandriello, Silvio Traversaro, Lorenzo Rosasco, Daniele Pucci

Figure 1 for On the Emergence of Whole-body Strategies from Humanoid Robot Push-recovery Learning

Figure 2 for On the Emergence of Whole-body Strategies from Humanoid Robot Push-recovery Learning

Figure 3 for On the Emergence of Whole-body Strategies from Humanoid Robot Push-recovery Learning

Figure 4 for On the Emergence of Whole-body Strategies from Humanoid Robot Push-recovery Learning

Abstract:Balancing and push-recovery are essential capabilities enabling humanoid robots to solve complex locomotion tasks. In this context, classical control systems tend to be based on simplified physical models and hard-coded strategies. Although successful in specific scenarios, this approach requires demanding tuning of parameters and switching logic between specifically-designed controllers for handling more general perturbations. We apply model-free Deep Reinforcement Learning for training a general and robust humanoid push-recovery policy in a simulation environment. Our method targets high-dimensional whole-body humanoid control and is validated on the iCub humanoid. Reward components incorporating expert knowledge on humanoid control enable fast learning of several robust behaviors by the same policy, spanning the entire body. We validate our method with extensive quantitative analyses in simulation, including out-of-sample tasks which demonstrate policy robustness and generalization, both key requirements towards real-world robot deployment.

* Co-first authors: Diego Ferigo and Raffaello Camoriano; 8 pages

Via

Access Paper or Ask Questions