Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lester Mackey

Jet-Images -- Deep Learning Edition

Jan 22, 2017

Luke de Oliveira, Michael Kagan, Lester Mackey, Benjamin Nachman, Ariel Schwartzman

Figure 1 for Jet-Images -- Deep Learning Edition

Abstract:Building on the notion of a particle physics detector as a camera and the collimated streams of high energy particles, or jets, it measures as an image, we investigate the potential of machine learning techniques based on deep learning architectures to identify highly boosted W bosons. Modern deep learning algorithms trained on jet images can out-perform standard physically-motivated feature driven approaches to jet tagging. We develop techniques for visualizing how these features are learned by the network and what additional information is used to improve performance. This interplay between physically-motivated feature driven tools and supervised learning algorithms is general and can be used to significantly increase the sensitivity to discover new particles and new forces, and gain a deeper understanding of the physics within jets.

* JHEP 07 (2016) 069
* 32 pages, 24 figures. Version that is published in JHEP

Via

Access Paper or Ask Questions

Weighted Classification Cascades for Optimizing Discovery Significance in the HiggsML Challenge

Sep 10, 2015

Lester Mackey, Jordan Bryan, Man Yue Mo

Abstract:We introduce a minorization-maximization approach to optimizing common measures of discovery significance in high energy physics. The approach alternates between solving a weighted binary classification problem and updating class weights in a simple, closed-form manner. Moreover, an argument based on convex duality shows that an improvement in weighted classification error on any round yields a commensurate improvement in discovery significance. We complement our derivation with experimental results from the 2014 Higgs boson machine learning challenge.

Via

Access Paper or Ask Questions

Fuzzy Jets

Sep 07, 2015

Lester Mackey, Benjamin Nachman, Ariel Schwartzman, Conrad Stansbury

Abstract:Collimated streams of particles produced in high energy physics experiments are organized using clustering algorithms to form jets. To construct jets, the experimental collaborations based at the Large Hadron Collider (LHC) primarily use agglomerative hierarchical clustering schemes known as sequential recombination. We propose a new class of algorithms for clustering jets that use infrared and collinear safe mixture models. These new algorithms, known as fuzzy jets, are clustered using maximum likelihood techniques and can dynamically determine various properties of jets like their size. We show that the fuzzy jet size adds additional information to conventional jet tagging variables. Furthermore, we study the impact of pileup and show that with some slight modifications to the algorithm, fuzzy jets can be stable up to high pileup interaction multiplicities.

* JHEP 06 (2016) 010

Via

Access Paper or Ask Questions

Corrupted Sensing: Novel Guarantees for Separating Structured Signals

Feb 04, 2014

Rina Foygel, Lester Mackey

Figure 1 for Corrupted Sensing: Novel Guarantees for Separating Structured Signals

Figure 2 for Corrupted Sensing: Novel Guarantees for Separating Structured Signals

Figure 3 for Corrupted Sensing: Novel Guarantees for Separating Structured Signals

Figure 4 for Corrupted Sensing: Novel Guarantees for Separating Structured Signals

Abstract:We study the problem of corrupted sensing, a generalization of compressed sensing in which one aims to recover a signal from a collection of corrupted or unreliable measurements. While an arbitrary signal cannot be recovered in the face of arbitrary corruption, tractable recovery is possible when both signal and corruption are suitably structured. We quantify the relationship between signal recovery and two geometric measures of structure, the Gaussian complexity of a tangent cone and the Gaussian distance to a subdifferential. We take a convex programming approach to disentangling signal and corruption, analyzing both penalized programs that trade off between signal and corruption complexity, and constrained programs that bound the complexity of signal or corruption when prior information is available. In each case, we provide conditions for exact signal recovery from structured corruption and stable signal recovery from structured corruption with added unstructured noise. Our simulations demonstrate close agreement between our theoretical recovery bounds and the sharp phase transitions observed in practice. In addition, we provide new interpretable bounds for the Gaussian complexity of sparse vectors, block-sparse vectors, and low-rank matrices, which lead to sharper guarantees of recovery when combined with our results and those in the literature.

* IEEE Transactions on Information Theory, 60(2): 1223-1247 (2014)
* http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6712045

Via

Access Paper or Ask Questions

The asymptotics of ranking algorithms

Nov 26, 2013

John C. Duchi, Lester Mackey, Michael I. Jordan

Figure 1 for The asymptotics of ranking algorithms

Figure 2 for The asymptotics of ranking algorithms

Figure 3 for The asymptotics of ranking algorithms

Figure 4 for The asymptotics of ranking algorithms

Abstract:We consider the predictive problem of supervised ranking, where the task is to rank sets of candidate items returned in response to queries. Although there exist statistical procedures that come with guarantees of consistency in this setting, these procedures require that individuals provide a complete ranking of all items, which is rarely feasible in practice. Instead, individuals routinely provide partial preference information, such as pairwise comparisons of items, and more practical approaches to ranking have aimed at modeling this partial preference data directly. As we show, however, such an approach raises serious theoretical challenges. Indeed, we demonstrate that many commonly used surrogate losses for pairwise comparison data do not yield consistency; surprisingly, we show inconsistency even in low-noise settings. With these negative results as motivation, we present a new approach to supervised ranking based on aggregation of partial preferences, and we develop $U$-statistic-based empirical risk minimization procedures. We present an asymptotic analysis of these new procedures, showing that they yield consistency results that parallel those available for classification. We complement our theoretical results with an experiment studying the new procedures in a large-scale web-ranking task.

* Annals of Statistics 2013, Vol. 41, No. 5, 2292-2323
* Published in at http://dx.doi.org/10.1214/13-AOS1142 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Via

Access Paper or Ask Questions

Distributed Matrix Completion and Robust Factorization

Oct 28, 2013

Lester Mackey, Ameet Talwalkar, Michael I. Jordan

Figure 1 for Distributed Matrix Completion and Robust Factorization

Figure 2 for Distributed Matrix Completion and Robust Factorization

Figure 3 for Distributed Matrix Completion and Robust Factorization

Figure 4 for Distributed Matrix Completion and Robust Factorization

Abstract:If learning methods are to scale to the massive sizes of modern datasets, it is essential for the field of machine learning to embrace parallel and distributed computing. Inspired by the recent development of matrix factorization methods with rich theory but poor computational complexity and by the relative ease of mapping matrices onto distributed architectures, we introduce a scalable divide-and-conquer framework for noisy matrix factorization. We present a thorough theoretical analysis of this framework in which we characterize the statistical errors introduced by the "divide" step and control their magnitude in the "conquer" step, so that the overall algorithm enjoys high-probability estimation guarantees comparable to those of its base algorithm. We also present experiments in collaborative filtering and video background modeling that demonstrate the near-linear to superlinear speed-ups attainable with this approach.

* 35 pages, 6 figures

Via

Access Paper or Ask Questions

Distributed Low-rank Subspace Segmentation

Oct 16, 2013

Ameet Talwalkar, Lester Mackey, Yadong Mu, Shih-Fu Chang, Michael I. Jordan

Figure 1 for Distributed Low-rank Subspace Segmentation

Figure 2 for Distributed Low-rank Subspace Segmentation

Figure 3 for Distributed Low-rank Subspace Segmentation

Figure 4 for Distributed Low-rank Subspace Segmentation

Abstract:Vision problems ranging from image clustering to motion segmentation to semi-supervised learning can naturally be framed as subspace segmentation problems, in which one aims to recover multiple low-dimensional subspaces from noisy and corrupted input data. Low-Rank Representation (LRR), a convex formulation of the subspace segmentation problem, is provably and empirically accurate on small problems but does not scale to the massive sizes of modern vision datasets. Moreover, past work aimed at scaling up low-rank matrix factorization is not applicable to LRR given its non-decomposable constraints. In this work, we propose a novel divide-and-conquer algorithm for large-scale subspace segmentation that can cope with LRR's non-decomposable constraints and maintains LRR's strong recovery guarantees. This has immediate implications for the scalability of subspace segmentation, which we demonstrate on a benchmark face recognition dataset and in simulations. We then introduce novel applications of LRR-based subspace segmentation to large-scale semi-supervised learning for multimedia event detection, concept detection, and image tagging. In each case, we obtain state-of-the-art results and order-of-magnitude speed ups.

Via

Access Paper or Ask Questions

Combinatorial clustering and the beta negative binomial process

Jun 10, 2013

Tamara Broderick, Lester Mackey, John Paisley, Michael I. Jordan

Figure 1 for Combinatorial clustering and the beta negative binomial process

Figure 2 for Combinatorial clustering and the beta negative binomial process

Figure 3 for Combinatorial clustering and the beta negative binomial process

Figure 4 for Combinatorial clustering and the beta negative binomial process

Abstract:We develop a Bayesian nonparametric approach to a general family of latent class problems in which individuals can belong simultaneously to multiple classes and where each class can be exhibited multiple times by an individual. We introduce a combinatorial stochastic process known as the negative binomial process (NBP) as an infinite-dimensional prior appropriate for such problems. We show that the NBP is conjugate to the beta process, and we characterize the posterior distribution under the beta-negative binomial process (BNBP) and hierarchical models based on the BNBP (the HBNBP). We study the asymptotic properties of the BNBP and develop a three-parameter extension of the BNBP that exhibits power-law behavior. We derive MCMC algorithms for posterior inference under the HBNBP, and we present experiments using these algorithms in the domains of image segmentation, object recognition, and document analysis.

* 56 pages, 4 figures, 6 tables

Via

Access Paper or Ask Questions

Feature-Weighted Linear Stacking

Nov 04, 2009

Joseph Sill, Gabor Takacs, Lester Mackey, David Lin

Figure 1 for Feature-Weighted Linear Stacking

Figure 2 for Feature-Weighted Linear Stacking

Abstract:Ensemble methods, such as stacking, are designed to boost predictive accuracy by blending the predictions of multiple machine learning models. Recent work has shown that the use of meta-features, additional inputs describing each example in a dataset, can boost the performance of ensemble methods, but the greatest reported gains have come from nonlinear procedures requiring significant tuning and training time. Here, we present a linear technique, Feature-Weighted Linear Stacking (FWLS), that incorporates meta-features for improved accuracy while retaining the well-known virtues of linear regression regarding speed, stability, and interpretability. FWLS combines model predictions linearly using coefficients that are themselves linear functions of meta-features. This technique was a key facet of the solution of the second place team in the recently concluded Netflix Prize competition. Significant increases in accuracy over standard linear stacking are demonstrated on the Netflix Prize collaborative filtering dataset.

* 17 pages, 1 figure, 2 tables

Via

Access Paper or Ask Questions