Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hal Daume III

University of Maryland

Learning Task Grouping and Overlap in Multi-task Learning

Jun 27, 2012

Abhishek Kumar, Hal Daume III

Figure 1 for Learning Task Grouping and Overlap in Multi-task Learning

Figure 2 for Learning Task Grouping and Overlap in Multi-task Learning

Figure 3 for Learning Task Grouping and Overlap in Multi-task Learning

Figure 4 for Learning Task Grouping and Overlap in Multi-task Learning

Abstract:In the paradigm of multi-task learning, mul- tiple related prediction tasks are learned jointly, sharing information across the tasks. We propose a framework for multi-task learn- ing that enables one to selectively share the information across the tasks. We assume that each task parameter vector is a linear combi- nation of a finite number of underlying basis tasks. The coefficients of the linear combina- tion are sparse in nature and the overlap in the sparsity patterns of two tasks controls the amount of sharing across these. Our model is based on on the assumption that task pa- rameters within a group lie in a low dimen- sional subspace but allows the tasks in differ- ent groups to overlap with each other in one or more bases. Experimental results on four datasets show that our approach outperforms competing methods.

* Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

Via

Access Paper or Ask Questions

A Binary Classification Framework for Two-Stage Multiple Kernel Learning

Jun 27, 2012

Abhishek Kumar, Alexandru Niculescu-Mizil, Koray Kavukcuoglu, Hal Daume III

Figure 1 for A Binary Classification Framework for Two-Stage Multiple Kernel Learning

Figure 2 for A Binary Classification Framework for Two-Stage Multiple Kernel Learning

Figure 3 for A Binary Classification Framework for Two-Stage Multiple Kernel Learning

Figure 4 for A Binary Classification Framework for Two-Stage Multiple Kernel Learning

Abstract:With the advent of kernel methods, automating the task of specifying a suitable kernel has become increasingly important. In this context, the Multiple Kernel Learning (MKL) problem of finding a combination of pre-specified base kernels that is suitable for the task at hand has received significant attention from researchers. In this paper we show that Multiple Kernel Learning can be framed as a standard binary classification problem with additional constraints that ensure the positive definiteness of the learned kernel. Framing MKL in this way has the distinct advantage that it makes it easy to leverage the extensive research in binary classification to develop better performing and more scalable MKL algorithms that are conceptually simpler, and, arguably, more accessible to practitioners. Experiments on nine data sets from different domains show that, despite its simplicity, the proposed technique compares favorably with current leading MKL approaches.

* Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

Via

Access Paper or Ask Questions

Efficient Protocols for Distributed Classification and Optimization

Apr 16, 2012

Hal Daume III, Jeff M. Phillips, Avishek Saha, Suresh Venkatasubramanian

Figure 1 for Efficient Protocols for Distributed Classification and Optimization

Figure 2 for Efficient Protocols for Distributed Classification and Optimization

Figure 3 for Efficient Protocols for Distributed Classification and Optimization

Figure 4 for Efficient Protocols for Distributed Classification and Optimization

Abstract:In distributed learning, the goal is to perform a learning task over data distributed across multiple nodes with minimal (expensive) communication. Prior work (Daume III et al., 2012) proposes a general model that bounds the communication required for learning classifiers while allowing for $\eps$ training error on linearly separable data adversarially distributed across nodes. In this work, we develop key improvements and extensions to this basic model. Our first result is a two-party multiplicative-weight-update based protocol that uses $O(d^2 \log{1/\eps})$ words of communication to classify distributed data in arbitrary dimension $d$, $\eps$-optimally. This readily extends to classification over $k$ nodes with $O(kd^2 \log{1/\eps})$ words of communication. Our proposed protocol is simple to implement and is considerably more efficient than baselines compared, as demonstrated by our empirical results. In addition, we illustrate general algorithm design paradigms for doing efficient learning over distributed data. We show how to solve fixed-dimensional and high dimensional linear programming efficiently in a distributed setting where constraints may be distributed across nodes. Since many learning problems can be viewed as convex optimization problems where constraints are generated by individual points, this models many typical distributed learning scenarios. Our techniques make use of a novel connection from multipass streaming, as well as adapting the multiplicative-weight-update framework more generally to a distributed setting. As a consequence, our methods extend to the wide range of problems solvable using these techniques.

Via

Access Paper or Ask Questions

Protocols for Learning Classifiers on Distributed Data

Feb 27, 2012

Hal Daume III, Jeff M. Phillips, Avishek Saha, Suresh Venkatasubramanian

Figure 1 for Protocols for Learning Classifiers on Distributed Data

Figure 2 for Protocols for Learning Classifiers on Distributed Data

Figure 3 for Protocols for Learning Classifiers on Distributed Data

Figure 4 for Protocols for Learning Classifiers on Distributed Data

Abstract:We consider the problem of learning classifiers for labeled data that has been distributed across several nodes. Our goal is to find a single classifier, with small approximation error, across all datasets while minimizing the communication between nodes. This setting models real-world communication bottlenecks in the processing of massive distributed datasets. We present several very general sampling-based solutions as well as some two-way protocols which have a provable exponential speed-up over any one-way protocol. We focus on core problems for noiseless data distributed across two or more nodes. The techniques we introduce are reminiscent of active learning, but rather than actively probing labels, nodes actively communicate with each other, each node simultaneously learning the important data from another node.

* 19 pages, 12 figures, accepted at AISTATS 2012

Via

Access Paper or Ask Questions

A Geometric View of Conjugate Priors

May 01, 2010

Arvind Agarwal, Hal Daume III

Figure 1 for A Geometric View of Conjugate Priors

Figure 2 for A Geometric View of Conjugate Priors

Figure 3 for A Geometric View of Conjugate Priors

Abstract:In Bayesian machine learning, conjugate priors are popular, mostly due to mathematical convenience. In this paper, we show that there are deeper reasons for choosing a conjugate prior. Specifically, we formulate the conjugate prior in the form of Bregman divergence and show that it is the inherent geometry of conjugate priors that makes them appropriate and intuitive. This geometric interpretation allows one to view the hyperparameters of conjugate priors as the {\it effective} sample points, thus providing additional intuition. We use this geometric understanding of conjugate priors to derive the hyperparameters and expression of the prior used to couple the generative and discriminative components of a hybrid model for semi-supervised learning.

* 16 pages, 4 Figures

Via

Access Paper or Ask Questions

Exponential Family Hybrid Semi-Supervised Learning

Mar 02, 2010

Arvind Agarwal, Hal Daume III

Figure 1 for Exponential Family Hybrid Semi-Supervised Learning

Figure 2 for Exponential Family Hybrid Semi-Supervised Learning

Figure 3 for Exponential Family Hybrid Semi-Supervised Learning

Figure 4 for Exponential Family Hybrid Semi-Supervised Learning

Abstract:We present an approach to semi-supervised learning based on an exponential family characterization. Our approach generalizes previous work on coupled priors for hybrid generative/discriminative models. Our model is more flexible and natural than previous approaches. Experimental results on several data sets show that our approach also performs better in practice.

* Twenty-First International Joint Conference on Artificial Intelligence 2009, pg 974-979
* 6 pages, 3 figures

Via

Access Paper or Ask Questions