Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Christopher Ré

Department of Computer Science, Stanford University

Hypertree Decompositions Revisited for PGMs

Jul 02, 2018

Aarthy Shivram Arun, Sai Vikneshwar Mani Jayaraman, Christopher Ré, Atri Rudra

Figure 1 for Hypertree Decompositions Revisited for PGMs

Figure 2 for Hypertree Decompositions Revisited for PGMs

Figure 3 for Hypertree Decompositions Revisited for PGMs

Figure 4 for Hypertree Decompositions Revisited for PGMs

Abstract:We revisit the classical problem of exact inference on probabilistic graphical models (PGMs). Our algorithm is based on recent \emph{worst-case optimal database join} algorithms, which can be asymptotically faster than traditional data processing methods. We present the first empirical evaluation of these algorithms via JoinInfer -- a new exact inference engine. We empirically explore the properties of the data for which our engine can be expected to outperform traditional inference engines, refining current theoretical notions. Further, JoinInfer outperforms existing state-of-the-art inference engines (ACE, IJGP and libDAI) on some standard benchmark datasets by up to a factor of 630x. Finally, we propose a promising data-driven heuristic that extends JoinInfer to automatically tailor its parameters and/or switch to the traditional inference algorithms.

* Accepted for StarAI Proceedings. Camera Ready Version of arXiv:1804.01640

Via

Access Paper or Ask Questions

Representation Tradeoffs for Hyperbolic Embeddings

Apr 24, 2018

Christopher De Sa, Albert Gu, Christopher Ré, Frederic Sala

Figure 1 for Representation Tradeoffs for Hyperbolic Embeddings

Figure 2 for Representation Tradeoffs for Hyperbolic Embeddings

Figure 3 for Representation Tradeoffs for Hyperbolic Embeddings

Figure 4 for Representation Tradeoffs for Hyperbolic Embeddings

Abstract:Hyperbolic embeddings offer excellent quality with few dimensions when embedding hierarchical data structures like synonym or type hierarchies. Given a tree, we give a combinatorial construction that embeds the tree in hyperbolic space with arbitrarily low distortion without using optimization. On WordNet, our combinatorial embedding obtains a mean-average-precision of 0.989 with only two dimensions, while Nickel et al.'s recent construction obtains 0.87 using 200 dimensions. We provide upper and lower bounds that allow us to characterize the precision-dimensionality tradeoff inherent in any hyperbolic embedding. To embed general metric spaces, we propose a hyperbolic generalization of multidimensional scaling (h-MDS). We show how to perform exact recovery of hyperbolic points from distances, provide a perturbation analysis, and give a recovery result that allows us to reduce dimensionality. The h-MDS approach offers consistently low distortion even with few dimensions across several datasets. Finally, we extract lessons from the algorithms and theory above to design a PyTorch-based implementation that can handle incomplete information and is scalable.

Via

Access Paper or Ask Questions

A Kernel Theory of Modern Data Augmentation

Mar 16, 2018

Tri Dao, Albert Gu, Alexander J. Ratner, Virginia Smith, Christopher De Sa, Christopher Ré

Figure 1 for A Kernel Theory of Modern Data Augmentation

Figure 2 for A Kernel Theory of Modern Data Augmentation

Figure 3 for A Kernel Theory of Modern Data Augmentation

Figure 4 for A Kernel Theory of Modern Data Augmentation

Abstract:Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines. In this paper, we seek to establish a theoretical framework for understanding modern data augmentation techniques. We start by showing that for kernel classifiers, data augmentation can be approximated by first-order feature averaging and second-order variance regularization components. We connect this general approximation framework to prior work in invariant kernels, tangent propagation, and robust optimization. Next, we explicitly tackle the compositional aspect of modern data augmentation techniques, proposing a novel model of data augmentation as a Markov process. Under this model, we show that performing $k$-nearest neighbors with data augmentation is asymptotically equivalent to a kernel classifier. Finally, we illustrate ways in which our theoretical framework can be leveraged to accelerate machine learning workflows in practice, including reducing the amount of computation needed to train on augmented data, and predicting the utility of a transformation prior to training.

Via

Access Paper or Ask Questions

High-Accuracy Low-Precision Training

Mar 09, 2018

Christopher De Sa, Megan Leszczynski, Jian Zhang, Alana Marzoev, Christopher R. Aberger, Kunle Olukotun, Christopher Ré

Figure 1 for High-Accuracy Low-Precision Training

Figure 2 for High-Accuracy Low-Precision Training

Figure 3 for High-Accuracy Low-Precision Training

Figure 4 for High-Accuracy Low-Precision Training

Abstract:Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it. Still, it has been used primarily for inference - not training. Previous low-precision training algorithms suffered from a fundamental tradeoff: as the number of bits of precision is lowered, quantization noise is added to the model, which limits statistical accuracy. To address this issue, we describe a simple low-precision stochastic gradient descent variant called HALP. HALP converges at the same theoretical rate as full-precision algorithms despite the noise introduced by using low precision throughout execution. The key idea is to use SVRG to reduce gradient variance, and to combine this with a novel technique called bit centering to reduce quantization error. We show that on the CPU, HALP can run up to $4 \times$ faster than full-precision SVRG and can match its convergence trajectory. We implemented HALP in TensorQuant, and show that it exceeds the validation performance of plain low-precision SGD on two deep learning tasks.

Via

Access Paper or Ask Questions

Gaussian Quadrature for Kernel Features

Jan 31, 2018

Tri Dao, Christopher De Sa, Christopher Ré

Figure 1 for Gaussian Quadrature for Kernel Features

Figure 2 for Gaussian Quadrature for Kernel Features

Abstract:Kernel methods have recently attracted resurgent interest, showing performance competitive with deep neural networks in tasks such as speech recognition. The random Fourier features map is a technique commonly used to scale up kernel machines, but employing the randomized feature map means that $O(\epsilon^{-2})$ samples are required to achieve an approximation error of at most $\epsilon$. We investigate some alternative schemes for constructing feature maps that are deterministic, rather than random, by approximating the kernel in the frequency domain using Gaussian quadrature. We show that deterministic feature maps can be constructed, for any $\gamma > 0$, to achieve error $\epsilon$ with $O(e^{e^\gamma} + \epsilon^{-1/\gamma})$ samples as $\epsilon$ goes to 0. Our method works particularly well with sparse ANOVA kernels, which are inspired by the convolutional layer of CNNs. We validate our methods on datasets in different domains, such as MNIST and TIMIT, showing that deterministic features are faster to generate and achieve accuracy comparable to the state-of-the-art kernel methods based on random Fourier features.

* Neural Information Processing Systems (NIPS) 2017

Via

Access Paper or Ask Questions

Snorkel: Rapid Training Data Creation with Weak Supervision

Nov 28, 2017

Alexander Ratner, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, Christopher Ré

Figure 1 for Snorkel: Rapid Training Data Creation with Weak Supervision

Figure 2 for Snorkel: Rapid Training Data Creation with Weak Supervision

Figure 3 for Snorkel: Rapid Training Data Creation with Weak Supervision

Figure 4 for Snorkel: Rapid Training Data Creation with Weak Supervision

Abstract:Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs without access to ground truth by incorporating the first end-to-end implementation of our recently proposed machine learning paradigm, data programming. We present a flexible interface layer for writing labeling functions based on our experience over the past year collaborating with companies, agencies, and research labs. In a user study, subject matter experts build models 2.8x faster and increase predictive performance an average 45.5% versus seven hours of hand labeling. We study the modeling tradeoffs in this new setting and propose an optimizer for automating tradeoff decisions that gives up to 1.8x speedup per pipeline execution. In two collaborations, with the U.S. Department of Veterans Affairs and the U.S. Food and Drug Administration, and on four open-source text and image data sets representative of other deployments, Snorkel provides 132% average improvements to predictive performance over prior heuristic approaches and comes within an average 3.60% of the predictive performance of large hand-curated training sets.

* Proceedings of the VLDB Endowment, 11(3), 269-282, 2017

Via

Access Paper or Ask Questions

Learning to Compose Domain-Specific Transformations for Data Augmentation

Sep 30, 2017

Alexander J. Ratner, Henry R. Ehrenberg, Zeshan Hussain, Jared Dunnmon, Christopher Ré

Figure 1 for Learning to Compose Domain-Specific Transformations for Data Augmentation

Figure 2 for Learning to Compose Domain-Specific Transformations for Data Augmentation

Figure 3 for Learning to Compose Domain-Specific Transformations for Data Augmentation

Figure 4 for Learning to Compose Domain-Specific Transformations for Data Augmentation

Abstract:Data augmentation is a ubiquitous technique for increasing the size of labeled training sets by leveraging task-specific data transformations that preserve class labels. While it is often easy for domain experts to specify individual transformations, constructing and tuning the more sophisticated compositions typically needed to achieve state-of-the-art results is a time-consuming manual task in practice. We propose a method for automating this process by learning a generative sequence model over user-specified transformation functions using a generative adversarial approach. Our method can make use of arbitrary, non-deterministic transformation functions, is robust to misspecified user input, and is trained on unlabeled data. The learned transformation model can then be used to perform data augmentation for any end discriminative model. In our experiments, we show the efficacy of our approach on both image and text datasets, achieving improvements of 4.0 accuracy points on CIFAR-10, 1.4 F1 points on the ACE relation extraction task, and 3.4 accuracy points when using domain-specific transformation operations on a medical imaging dataset as compared to standard heuristic augmentation approaches.

* To appear at Neural Information Processing Systems (NIPS) 2017

Via

Access Paper or Ask Questions

Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

Sep 28, 2017

Paroma Varma, Bryan He, Dan Iter, Peng Xu, Rose Yu, Christopher De Sa, Christopher Ré

Figure 1 for Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

Figure 2 for Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

Figure 3 for Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

Figure 4 for Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

Abstract:A challenge in training discriminative models like neural networks is obtaining enough labeled training data. Recent approaches use generative models to combine weak supervision sources, like user-defined heuristics or knowledge bases, to label training data. Prior work has explored learning accuracies for these sources even without ground truth labels, but they assume that a single accuracy parameter is sufficient to model the behavior of these sources over the entire training set. In particular, they fail to model latent subsets in the training data in which the supervision sources perform differently than on average. We present Socratic learning, a paradigm that uses feedback from a corresponding discriminative model to automatically identify these subsets and augments the structure of the generative model accordingly. Experimentally, we show that without any ground truth labels, the augmented generative model reduces error by up to 56.06% for a relation extraction task compared to a state-of-the-art weak supervision technique that utilizes generative models.

* 4 figures; 18 pages

Via

Access Paper or Ask Questions

Learning the Structure of Generative Models without Labeled Data

Sep 09, 2017

Stephen H. Bach, Bryan He, Alexander Ratner, Christopher Ré

Figure 1 for Learning the Structure of Generative Models without Labeled Data

Figure 2 for Learning the Structure of Generative Models without Labeled Data

Figure 3 for Learning the Structure of Generative Models without Labeled Data

Figure 4 for Learning the Structure of Generative Models without Labeled Data

Abstract:Curating labeled training data has become the primary bottleneck in machine learning. Recent frameworks address this bottleneck with generative models to synthesize labels at scale from weak supervision sources. The generative model's dependency structure directly affects the quality of the estimated labels, but selecting a structure automatically without any labeled data is a distinct challenge. We propose a structure estimation method that maximizes the $\ell_1$-regularized marginal pseudolikelihood of the observed data. Our analysis shows that the amount of unlabeled data required to identify the true structure scales sublinearly in the number of possible dependencies for a broad class of models. Simulations show that our method is 100$\times$ faster than a maximum likelihood approach and selects $1/4$ as many extraneous dependencies. We also show that our method provides an average of 1.5 F1 points of improvement over existing, user-developed information extraction applications on real-world data such as PubMed journal abstracts.

* Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017

Via

Access Paper or Ask Questions

Inferring Generative Model Structure with Static Analysis

Sep 07, 2017

Paroma Varma, Bryan He, Payal Bajaj, Imon Banerjee, Nishith Khandwala, Daniel L. Rubin, Christopher Ré

Figure 1 for Inferring Generative Model Structure with Static Analysis

Figure 2 for Inferring Generative Model Structure with Static Analysis

Figure 3 for Inferring Generative Model Structure with Static Analysis

Figure 4 for Inferring Generative Model Structure with Static Analysis

Abstract:Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline. A popular solution is combining multiple sources of weak supervision using generative models. The structure of these models affects training label quality, but is difficult to learn without any ground truth labels. We instead rely on these weak supervision sources having some structure by virtue of being encoded programmatically. We present Coral, a paradigm that infers generative model structure by statically analyzing the code for these heuristics, thus reducing the data required to learn structure significantly. We prove that Coral's sample complexity scales quasilinearly with the number of heuristics and number of relations found, improving over the standard sample complexity, which is exponential in $n$ for identifying $n^{\textrm{th}}$ degree relations. Experimentally, Coral matches or outperforms traditional structure learning approaches by up to 3.81 F1 points. Using Coral to model dependencies instead of assuming independence results in better performance than a fully supervised model by 3.07 accuracy points when heuristics are used to label radiology data without ground truth labels.

* NIPS 2017

Via

Access Paper or Ask Questions