Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marius Kloft

Efficient Gaussian Process Classification Using Polya-Gamma Data Augmentation

Feb 18, 2018
Florian Wenzel, Theo Galy-Fajou, Christan Donner, Marius Kloft, Manfred Opper

Figure 1 for Efficient Gaussian Process Classification Using Polya-Gamma Data Augmentation

Figure 2 for Efficient Gaussian Process Classification Using Polya-Gamma Data Augmentation

Figure 3 for Efficient Gaussian Process Classification Using Polya-Gamma Data Augmentation

Figure 4 for Efficient Gaussian Process Classification Using Polya-Gamma Data Augmentation

We propose an efficient stochastic variational approach to GP classification building on Polya- Gamma data augmentation and inducing points, which is based on closed-form updates of natural gradients. We evaluate the algorithm on real-world datasets containing up to 11 million data points and demonstrate that it is up to three orders of magnitude faster than the state-of-the-art while being competitive in terms of prediction performance.

Via

Access Paper or Ask Questions

Data-dependent Generalization Bounds for Multi-class Classification

Dec 29, 2017
Yunwen Lei, Urun Dogan, Ding-Xuan Zhou, Marius Kloft

Figure 1 for Data-dependent Generalization Bounds for Multi-class Classification

Figure 2 for Data-dependent Generalization Bounds for Multi-class Classification

Figure 3 for Data-dependent Generalization Bounds for Multi-class Classification

Figure 4 for Data-dependent Generalization Bounds for Multi-class Classification

In this paper, we study data-dependent generalization error bounds exhibiting a mild dependency on the number of classes, making them suitable for multi-class learning with a large number of label classes. The bounds generally hold for empirical multi-class risk minimization algorithms using an arbitrary norm as regularizer. Key to our analysis are new structural results for multi-class Gaussian complexities and empirical $\ell_\infty$-norm covering numbers, which exploit the Lipschitz continuity of the loss function with respect to the $\ell_2$- and $\ell_\infty$-norm, respectively. We establish data-dependent error bounds in terms of complexities of a linear function class defined on a finite set induced by training examples, for which we show tight lower and upper bounds. We apply the results to several prominent multi-class learning machines, exhibiting a tighter dependency on the number of classes than the state of the art. For instance, for the multi-class SVM by Crammer and Singer (2002), we obtain a data-dependent bound with a logarithmic dependency which significantly improves the previous square-root dependency. Experimental results are reported to verify the effectiveness of our theoretical findings.

Via

Access Paper or Ask Questions

Bayesian Nonlinear Support Vector Machines for Big Data

Jul 18, 2017
Florian Wenzel, Theo Galy-Fajou, Matthaeus Deutsch, Marius Kloft

Figure 1 for Bayesian Nonlinear Support Vector Machines for Big Data

Figure 2 for Bayesian Nonlinear Support Vector Machines for Big Data

Figure 3 for Bayesian Nonlinear Support Vector Machines for Big Data

Figure 4 for Bayesian Nonlinear Support Vector Machines for Big Data

We propose a fast inference method for Bayesian nonlinear support vector machines that leverages stochastic variational inference and inducing points. Our experiments show that the proposed method is faster than competing Bayesian approaches and scales easily to millions of data points. It provides additional features over frequentist competitors such as accurate predictive uncertainty estimates and automatic hyperparameter search.

* accepted as conference paper at ECML-PKDD 2017

Via

Access Paper or Ask Questions

Sparse Probit Linear Mixed Model

Jul 17, 2017
Stephan Mandt, Florian Wenzel, Shinichi Nakajima, John P. Cunningham, Christoph Lippert, Marius Kloft

Figure 1 for Sparse Probit Linear Mixed Model

Figure 2 for Sparse Probit Linear Mixed Model

Figure 3 for Sparse Probit Linear Mixed Model

Figure 4 for Sparse Probit Linear Mixed Model

Linear Mixed Models (LMMs) are important tools in statistical genetics. When used for feature selection, they allow to find a sparse set of genetic traits that best predict a continuous phenotype of interest, while simultaneously correcting for various confounding factors such as age, ethnicity and population structure. Formulated as models for linear regression, LMMs have been restricted to continuous phenotypes. We introduce the Sparse Probit Linear Mixed Model (Probit-LMM), where we generalize the LMM modeling paradigm to binary phenotypes. As a technical challenge, the model no longer possesses a closed-form likelihood function. In this paper, we present a scalable approximate inference algorithm that lets us fit the model to high-dimensional data sets. We show on three real-world examples from different domains that in the setup of binary labels, our algorithm leads to better prediction accuracies and also selects features which show less correlation with the confounding factors.

* Machine Learning, 106(9), 1621-1642 (2017)
* Published version, 21 pages, 6 figures

Via

Access Paper or Ask Questions

Local Rademacher Complexity-based Learning Guarantees for Multi-Task Learning

Feb 09, 2017
Niloofar Yousefi, Yunwen Lei, Marius Kloft, Mansooreh Mollaghasemi, Georgios Anagnostopoulos

We show a Talagrand-type concentration inequality for Multi-Task Learning (MTL), using which we establish sharp excess risk bounds for MTL in terms of distribution- and data-dependent versions of the Local Rademacher Complexity (LRC). We also give a new bound on the LRC for norm regularized as well as strongly convex hypothesis classes, which applies not only to MTL but also to the standard i.i.d. setting. Combining both results, one can now easily derive fast-rate bounds on the excess risk for many prominent MTL methods, including---as we demonstrate---Schatten-norm, group-norm, and graph-regularized MTL. The derived bounds reflect a relationship akeen to a conservation law of asymptotic convergence rates. This very relationship allows for trading off slower rates w.r.t. the number of tasks for faster rates with respect to the number of available samples per task, when compared to the rates obtained via a traditional, global Rademacher analysis.

* In this version, some arguments and results (of the previous version) have been corrected, or modified

Via

Access Paper or Ask Questions

Distributed Optimization of Multi-Class SVMs

Dec 08, 2016
Maximilian Alber, Julian Zimmert, Urun Dogan, Marius Kloft

Figure 1 for Distributed Optimization of Multi-Class SVMs

Figure 2 for Distributed Optimization of Multi-Class SVMs

Figure 3 for Distributed Optimization of Multi-Class SVMs

Figure 4 for Distributed Optimization of Multi-Class SVMs

Training of one-vs.-rest SVMs can be parallelized over the number of classes in a straight forward way. Given enough computational resources, one-vs.-rest SVMs can thus be trained on data involving a large number of classes. The same cannot be stated, however, for the so-called all-in-one SVMs, which require solving a quadratic program of size quadratically in the number of classes. We develop distributed algorithms for two all-in-one SVM formulations (Lee et al. and Weston and Watkins) that parallelize the computation evenly over the number of classes. This allows us to compare these models to one-vs.-rest SVMs on unprecedented scale. The results indicate superior accuracy on text classification data.

Via

Access Paper or Ask Questions

Feature Importance Measure for Non-linear Learning Algorithms

Nov 22, 2016
Marina M. -C. Vidovic, Nico Görnitz, Klaus-Robert Müller, Marius Kloft

Figure 1 for Feature Importance Measure for Non-linear Learning Algorithms

Figure 2 for Feature Importance Measure for Non-linear Learning Algorithms

Figure 3 for Feature Importance Measure for Non-linear Learning Algorithms

Figure 4 for Feature Importance Measure for Non-linear Learning Algorithms

Complex problems may require sophisticated, non-linear learning methods such as kernel machines or deep neural networks to achieve state of the art prediction accuracies. However, high prediction accuracies are not the only objective to consider when solving problems using machine learning. Instead, particular scientific applications require some explanation of the learned prediction function. Unfortunately, most methods do not come with out of the box straight forward interpretation. Even linear prediction functions are not straight forward to explain if features exhibit complex correlation structure. In this paper, we propose the Measure of Feature Importance (MFI). MFI is general and can be applied to any arbitrary learning machine (including kernel machines and deep learning). MFI is intrinsically non-linear and can detect features that by itself are inconspicuous and only impact the prediction function through their interaction with other features. Lastly, MFI can be used for both --- model-based feature importance and instance-based feature importance (i.e, measuring the importance of a feature for a particular data point).

* Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Via

Access Paper or Ask Questions

Localized Multiple Kernel Learning---A Convex Approach

Oct 13, 2016
Yunwen Lei, Alexander Binder, Ürün Dogan, Marius Kloft

Figure 1 for Localized Multiple Kernel Learning---A Convex Approach

Figure 2 for Localized Multiple Kernel Learning---A Convex Approach

Figure 3 for Localized Multiple Kernel Learning---A Convex Approach

Figure 4 for Localized Multiple Kernel Learning---A Convex Approach

We propose a localized approach to multiple kernel learning that can be formulated as a convex optimization problem over a given cluster structure. For which we obtain generalization error guarantees and derive an optimization algorithm based on the Fenchel dual representation. Experiments on real-world datasets from the application domains of computational biology and computer vision show that convex localized multiple kernel learning can achieve higher prediction accuracies than its global and non-convex local counterparts.

* to appear in ACML 2016

Via

Access Paper or Ask Questions

Framework for Multi-task Multiple Kernel Learning and Applications in Genome Analysis

Jun 30, 2015
Christian Widmer, Marius Kloft, Vipin T Sreedharan, Gunnar Rätsch

Figure 1 for Framework for Multi-task Multiple Kernel Learning and Applications in Genome Analysis

Figure 2 for Framework for Multi-task Multiple Kernel Learning and Applications in Genome Analysis

Figure 3 for Framework for Multi-task Multiple Kernel Learning and Applications in Genome Analysis

Figure 4 for Framework for Multi-task Multiple Kernel Learning and Applications in Genome Analysis

We present a general regularization-based framework for Multi-task learning (MTL), in which the similarity between tasks can be learned or refined using $\ell_p$-norm Multiple Kernel learning (MKL). Based on this very general formulation (including a general loss function), we derive the corresponding dual formulation using Fenchel duality applied to Hermitian matrices. We show that numerous established MTL methods can be derived as special cases from both, the primal and dual of our formulation. Furthermore, we derive a modern dual-coordinate descend optimization strategy for the hinge-loss variant of our formulation and provide convergence bounds for our algorithm. As a special case, we implement in C++ a fast LibLinear-style solver for $\ell_p$-norm MKL. In the experimental section, we analyze various aspects of our algorithm such as predictive performance and ability to reconstruct task relationships on biologically inspired synthetic data, where we have full control over the underlying ground truth. We also experiment on a new dataset from the domain of computational biology that we collected for the purpose of this paper. It concerns the prediction of transcription start sites (TSS) over nine organisms, which is a crucial task in gene finding. Our solvers including all discussed special cases are made available as open-source software as part of the SHOGUN machine learning toolbox (available at \url{http://shogun.ml}).

Via

Access Paper or Ask Questions

Multi-class SVMs: From Tighter Data-Dependent Generalization Bounds to Novel Algorithms

Jun 14, 2015
Yunwen Lei, Ürün Dogan, Alexander Binder, Marius Kloft

Figure 1 for Multi-class SVMs: From Tighter Data-Dependent Generalization Bounds to Novel Algorithms

This paper studies the generalization performance of multi-class classification algorithms, for which we obtain, for the first time, a data-dependent generalization error bound with a logarithmic dependence on the class size, substantially improving the state-of-the-art linear dependence in the existing data-dependent generalization analysis. The theoretical analysis motivates us to introduce a new multi-class classification machine based on $\ell_p$-norm regularization, where the parameter $p$ controls the complexity of the corresponding bounds. We derive an efficient optimization algorithm based on Fenchel duality theory. Benchmarks on several real-world datasets show that the proposed algorithm can achieve significant accuracy gains over the state of the art.

Via

Access Paper or Ask Questions