Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shai Shalev-Shwartz

Hebrew University

ShareBoost: Efficient Multiclass Learning with Feature Sharing

Sep 05, 2011

Shai Shalev-Shwartz, Yonatan Wexler, Amnon Shashua

Figure 1 for ShareBoost: Efficient Multiclass Learning with Feature Sharing

Figure 2 for ShareBoost: Efficient Multiclass Learning with Feature Sharing

Figure 3 for ShareBoost: Efficient Multiclass Learning with Feature Sharing

Figure 4 for ShareBoost: Efficient Multiclass Learning with Feature Sharing

Abstract:Multiclass prediction is the problem of classifying an object into a relevant target class. We consider the problem of learning a multiclass predictor that uses only few features, and in particular, the number of used features should increase sub-linearly with the number of possible classes. This implies that features should be shared by several classes. We describe and analyze the ShareBoost algorithm for learning a multiclass predictor that uses few shared features. We prove that ShareBoost efficiently finds a predictor that uses few shared features (if such a predictor exists) and that it has a small generalization error. We also describe how to use ShareBoost for learning a non-linear predictor that has a fast evaluation time. In a series of experiments with natural data sets we demonstrate the benefits of ShareBoost and evaluate its success relatively to other state-of-the-art approaches.

Via

Access Paper or Ask Questions

Using More Data to Speed-up Training Time

Jun 15, 2011

Shai Shalev-Shwartz, Ohad Shamir, Eran Tromer

Figure 1 for Using More Data to Speed-up Training Time

Abstract:In many recent applications, data is plentiful. By now, we have a rather clear understanding of how more data can be used to improve the accuracy of learning algorithms. Recently, there has been a growing interest in understanding how more data can be leveraged to reduce the required training runtime. In this paper, we study the runtime of learning as a function of the number of available training examples, and underscore the main high-level techniques. We provide some initial positive results showing that the runtime can decrease exponentially while only requiring a polynomial growth of the number of examples, and spell-out several interesting open problems.

Via

Access Paper or Ask Questions

Large-Scale Convex Minimization with a Low-Rank Constraint

Jun 08, 2011

Shai Shalev-Shwartz, Alon Gonen, Ohad Shamir

Figure 1 for Large-Scale Convex Minimization with a Low-Rank Constraint

Abstract:We address the problem of minimizing a convex function over the space of large matrices with low rank. While this optimization problem is hard in general, we propose an efficient greedy algorithm and derive its formal approximation guarantees. Each iteration of the algorithm involves (approximately) finding the left and right singular vectors corresponding to the largest singular value of a certain matrix, which can be calculated in linear time. This leads to an algorithm which can scale to large matrices arising in several applications such as matrix completion for collaborative filtering and robust low rank matrix approximation.

* ICML 2011

Via

Access Paper or Ask Questions

Regularization Techniques for Learning with Matrices

Oct 17, 2010

Sham M. Kakade, Shai Shalev-Shwartz, Ambuj Tewari

Figure 1 for Regularization Techniques for Learning with Matrices

Abstract:There is growing body of learning problems for which it is natural to organize the parameters into matrix, so as to appropriately regularize the parameters under some matrix norm (in order to impose some more sophisticated prior knowledge). This work describes and analyzes a systematic method for constructing such matrix-based, regularization methods. In particular, we focus on how the underlying statistical properties of a given problem can help us decide which regularization function is appropriate. Our methodology is based on the known duality fact: that a function is strongly convex with respect to some norm if and only if its conjugate function is strongly smooth with respect to the dual norm. This result has already been found to be a key component in deriving and analyzing several learning algorithms. We demonstrate the potential of this framework by deriving novel generalization and regret bounds for multi-task learning, multi-class learning, and kernel learning.

Via

Access Paper or Ask Questions

Learning Kernel-Based Halfspaces with the Zero-One Loss

Aug 01, 2010

Shai Shalev-Shwartz, Ohad Shamir, Karthik Sridharan

Figure 1 for Learning Kernel-Based Halfspaces with the Zero-One Loss

Abstract:We describe and analyze a new algorithm for agnostically learning kernel-based halfspaces with respect to the \emph{zero-one} loss function. Unlike most previous formulations which rely on surrogate convex loss functions (e.g. hinge-loss in SVM and log-loss in logistic regression), we provide finite time/sample guarantees with respect to the more natural zero-one loss function. The proposed algorithm can learn kernel-based halfspaces in worst-case time $\poly(\exp(L\log(L/\epsilon)))$, for $\emph{any}$ distribution, where $L$ is a Lipschitz constant (which can be thought of as the reciprocal of the margin), and the learned classifier is worse than the optimal halfspace by at most $\epsilon$. We also prove a hardness result, showing that under a certain cryptographic assumption, no algorithm can learn kernel-based halfspaces in time polynomial in $L$.

* This is a full version of the paper appearing in the 23rd International Conference on Learning Theory (COLT 2010). Compared to the previous arXiv version, this version contains some small corrections in the proof of Lemma 3 and in appendix A

Via

Access Paper or Ask Questions

Online Learning of Noisy Data with Kernels

May 20, 2010

Nicolò Cesa-Bianchi, Shai Shalev-Shwartz, Ohad Shamir

Figure 1 for Online Learning of Noisy Data with Kernels

Abstract:We study online learning when individual instances are corrupted by adversarially chosen random noise. We assume the noise distribution is unknown, and may change over time with no restriction other than having zero mean and bounded variance. Our technique relies on a family of unbiased estimators for non-linear functions, which may be of independent interest. We show that a variant of online gradient descent can learn functions in any dot-product (e.g., polynomial) or Gaussian kernel space with any analytic convex loss function. Our variant uses randomized estimates that need to query a random number of noisy copies of each instance, where with high probability this number is upper bounded by a constant. Allowing such multiple queries cannot be avoided: Indeed, we show that online learning is in general impossible when only one noisy copy of each instance can be accessed.

* This is a full version of the paper appearing in the 23rd International Conference on Learning Theory (COLT 2010)

Via

Access Paper or Ask Questions

Efficient Learning with Partially Observed Attributes

Apr 28, 2010

Nicolò Cesa-Bianchi, Shai Shalev-Shwartz, Ohad Shamir

Figure 1 for Efficient Learning with Partially Observed Attributes

Figure 2 for Efficient Learning with Partially Observed Attributes

Abstract:We describe and analyze efficient algorithms for learning a linear predictor from examples when the learner can only view a few attributes of each training example. This is the case, for instance, in medical research, where each patient participating in the experiment is only willing to go through a small number of tests. Our analysis bounds the number of additional examples sufficient to compensate for the lack of full information on each training example. We demonstrate the efficiency of our algorithms by showing that when running on digit recognition data, they obtain a high prediction accuracy even when the learner gets to see only four pixels of each image.

* This is a full version of the paper appearing in The 27th International Conference on Machine Learning (ICML 2010)

Via

Access Paper or Ask Questions