Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pradeep Ravikumar

Minimizing FLOPs to Learn Efficient Sparse Representations

Apr 12, 2020
Biswajit Paria, Chih-Kuan Yeh, Ian E. H. Yen, Ning Xu, Pradeep Ravikumar, Barnabás Póczos

Figure 1 for Minimizing FLOPs to Learn Efficient Sparse Representations

Figure 2 for Minimizing FLOPs to Learn Efficient Sparse Representations

Figure 3 for Minimizing FLOPs to Learn Efficient Sparse Representations

Figure 4 for Minimizing FLOPs to Learn Efficient Sparse Representations

Deep representation learning has become one of the most widely adopted approaches for visual search, recommendation, and identification. Retrieval of such representations from a large database is however computationally challenging. Approximate methods based on learning compact representations, have been widely explored for this problem, such as locality sensitive hashing, product quantization, and PCA. In this work, in contrast to learning compact representations, we propose to learn high dimensional and sparse representations that have similar representational capacity as dense embeddings while being more efficient due to sparse matrix multiplication operations which can be much faster than dense multiplication. Following the key insight that the number of operations decreases quadratically with the sparsity of embeddings provided the non-zero entries are distributed uniformly across dimensions, we propose a novel approach to learn such distributed sparse embeddings via the use of a carefully constructed regularization function that directly minimizes a continuous relaxation of the number of floating-point operations (FLOPs) incurred during retrieval. Our experiments show that our approach is competitive to the other baselines and yields a similar or better speed-vs-accuracy tradeoff on practical datasets.

* Published at ICLR 2020

Via

Access Paper or Ask Questions

MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius

Feb 15, 2020
Runtian Zhai, Chen Dan, Di He, Huan Zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, Liwei Wang

Figure 1 for MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius

Figure 2 for MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius

Figure 3 for MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius

Figure 4 for MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius

Adversarial training is one of the most popular ways to learn robust models but is usually attack-dependent and time costly. In this paper, we propose the MACER algorithm, which learns robust models without using adversarial training but performs better than all existing provable l2-defenses. Recent work shows that randomized smoothing can be used to provide a certified l2 radius to smoothed classifiers, and our algorithm trains provably robust smoothed classifiers via MAximizing the CErtified Radius (MACER). The attack-free characteristic makes MACER faster to train and easier to optimize. In our experiments, we show that our method can be applied to modern deep neural networks on a wide range of datasets, including Cifar-10, ImageNet, MNIST, and SVHN. For all tasks, MACER spends less training time than state-of-the-art adversarial training algorithms, and the learned models achieve larger average certified radius.

* In ICLR 2020. 20 Pages

Via

Access Paper or Ask Questions

Certified Robustness to Label-Flipping Attacks via Randomized Smoothing

Feb 07, 2020
Elan Rosenfeld, Ezra Winston, Pradeep Ravikumar, J. Zico Kolter

Figure 1 for Certified Robustness to Label-Flipping Attacks via Randomized Smoothing

Figure 2 for Certified Robustness to Label-Flipping Attacks via Randomized Smoothing

Figure 3 for Certified Robustness to Label-Flipping Attacks via Randomized Smoothing

Figure 4 for Certified Robustness to Label-Flipping Attacks via Randomized Smoothing

Machine learning algorithms are known to be susceptible to data poisoning attacks, where an adversary manipulates the training data to degrade performance of the resulting classifier. While many heuristic defenses have been proposed, few defenses exist which are certified against worst-case corruption of the training data. In this work, we propose a strategy to build linear classifiers that are certifiably robust against a strong variant of label-flipping, where each test example is targeted independently. In other words, for each test point, our classifier makes a prediction and includes a certification that its prediction would be the same had some number of training labels been changed adversarially. Our approach leverages randomized smoothing, a technique that has previously been used to guarantee---with high probability---test-time robustness to adversarial manipulation of the input to a classifier. We derive a variant which provides a deterministic, analytical bound, sidestepping the probabilistic certificates that traditionally result from the sampling subprocedure. Further, we obtain these certified bounds with no additional runtime cost over standard classification. We generalize our results to the multi-class case, providing what we believe to be the first multi-class classification algorithm that is certifiably robust to label-flipping attacks.

Via

Access Paper or Ask Questions

Game Design for Eliciting Distinguishable Behavior

Dec 12, 2019
Fan Yang, Liu Leqi, Yifan Wu, Zachary C. Lipton, Pradeep Ravikumar, William W. Cohen, Tom Mitchell

Figure 1 for Game Design for Eliciting Distinguishable Behavior

Figure 2 for Game Design for Eliciting Distinguishable Behavior

Figure 3 for Game Design for Eliciting Distinguishable Behavior

Figure 4 for Game Design for Eliciting Distinguishable Behavior

The ability to inferring latent psychological traits from human behavior is key to developing personalized human-interacting machine learning systems. Approaches to infer such traits range from surveys to manually-constructed experiments and games. However, these traditional games are limited because they are typically designed based on heuristics. In this paper, we formulate the task of designing \emph{behavior diagnostic games} that elicit distinguishable behavior as a mutual information maximization problem, which can be solved by optimizing a variational lower bound. Our framework is instantiated by using prospect theory to model varying player traits, and Markov Decision Processes to parameterize the games. We validate our approach empirically, showing that our designed games can successfully distinguish among players with different traits, outperforming manually-designed ones by a large margin.

* 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)

Via

Access Paper or Ask Questions

Diagnostic Curves for Black Box Models

Dec 02, 2019
David I. Inouye, Liu Leqi, Joon Sik Kim, Bryon Aragam, Pradeep Ravikumar

Figure 1 for Diagnostic Curves for Black Box Models

Figure 2 for Diagnostic Curves for Black Box Models

Figure 3 for Diagnostic Curves for Black Box Models

Figure 4 for Diagnostic Curves for Black Box Models

In safety-critical applications of machine learning, it is often necessary to look beyond standard metrics such as test accuracy in order to validate various qualitative properties such as monotonicity with respect to a feature or combination of features, checking for undesirable changes or oscillations in the response, and differences in outcomes (e.g. discrimination) for a protected class. To help answer this need, we propose a framework for approximately validating (or invalidating) various properties of a black box model by finding a univariate diagnostic curve in the input space whose output maximally violates a given property. These diagnostic curves show the exact value of the model along the curve and can be displayed with a simple and intuitive line graph. We demonstrate the usefulness of these diagnostic curves across multiple use-cases and datasets including selecting between two models and understanding out-of-sample behavior.

* Accepted to NeurIPS 2019 Workshop on Safety and Robustness in Decision Making

Via

Access Paper or Ask Questions

Optimal Analysis of Subset-Selection Based L_p Low Rank Approximation

Oct 30, 2019
Chen Dan, Hong Wang, Hongyang Zhang, Yuchen Zhou, Pradeep Ravikumar

We study the low rank approximation problem of any given matrix $A$ over $\mathbb{R}^{n\times m}$ and $\mathbb{C}^{n\times m}$ in entry-wise $\ell_p$ loss, that is, finding a rank-$k$ matrix $X$ such that $\|A-X\|_p$ is minimized. Unlike the traditional $\ell_2$ setting, this particular variant is NP-Hard. We show that the algorithm of column subset selection, which was an algorithmic foundation of many existing algorithms, enjoys approximation ratio $(k+1)^{1/p}$ for $1\le p\le 2$ and $(k+1)^{1-1/p}$ for $p\ge 2$. This improves upon the previous $O(k+1)$ bound for $p\ge 1$ \cite{chierichetti2017algorithms}. We complement our analysis with lower bounds; these bounds match our upper bounds up to constant $1$ when $p\geq 2$. At the core of our techniques is an application of \emph{Riesz-Thorin interpolation theorem} from harmonic analysis, which might be of independent interest to other algorithmic designs and analysis more broadly. As a consequence of our analysis, we provide better approximation guarantees for several other algorithms with various time complexity. For example, to make the algorithm of column subset selection computationally efficient, we analyze a polynomial time bi-criteria algorithm which selects $O(k\log m)$ columns. We show that this algorithm has an approximation ratio of $O((k+1)^{1/p})$ for $1\le p\le 2$ and $O((k+1)^{1-1/p})$ for $p\ge 2$. This improves over the best-known bound with an $O(k+1)$ approximation ratio. Our bi-criteria algorithm also implies an exact-rank method in polynomial time with a slightly larger approximation ratio.

* 20 pages, accepted by NeurIPS 2019

Via

Access Paper or Ask Questions

On Concept-Based Explanations in Deep Neural Networks

Oct 17, 2019
Chih-Kuan Yeh, Been Kim, Sercan O. Arik, Chun-Liang Li, Pradeep Ravikumar, Tomas Pfister

Figure 1 for On Concept-Based Explanations in Deep Neural Networks

Figure 2 for On Concept-Based Explanations in Deep Neural Networks

Figure 3 for On Concept-Based Explanations in Deep Neural Networks

Figure 4 for On Concept-Based Explanations in Deep Neural Networks

Deep neural networks (DNNs) build high-level intelligence on low-level raw features. Understanding of this high-level intelligence can be enabled by deciphering the concepts they base their decisions on, as human-level thinking. In this paper, we study concept-based explainability for DNNs in a systematic framework. First, we define the notion of completeness, which quantifies how sufficient a particular set of concepts is in explaining a model's prediction behavior. Based on performance and variability motivations, we propose two definitions to quantify completeness. We show that under degenerate conditions, our method is equivalent to Principal Component Analysis. Next, we propose a concept discovery method that considers two additional constraints to encourage the interpretability of the discovered concepts. We use game-theoretic notions to aggregate over sets to define an importance score for each discovered concept, which we call ConceptSHAP. On specifically-designed synthetic datasets and real-world text and image datasets, we validate the effectiveness of our framework in finding concepts that are complete in explaining the decision, and interpretable.

Via

Access Paper or Ask Questions

Learning Sparse Nonparametric DAGs

Sep 29, 2019
Xun Zheng, Chen Dan, Bryon Aragam, Pradeep Ravikumar, Eric P. Xing

Figure 1 for Learning Sparse Nonparametric DAGs

Figure 2 for Learning Sparse Nonparametric DAGs

Figure 3 for Learning Sparse Nonparametric DAGs

Figure 4 for Learning Sparse Nonparametric DAGs

We develop a framework for learning sparse nonparametric directed acyclic graphs (DAGs) from data. Our approach is based on a recent algebraic characterization of DAGs that led to the first fully continuous optimization for score-based learning of DAG models parametrized by a linear structural equation model (SEM). We extend this algebraic characterization to nonparametric SEM by leveraging nonparametric sparsity based on partial derivatives, resulting in a continuous optimization problem that can be applied to a variety of nonparametric and semiparametric models including GLMs, additive noise models, and index models as special cases. We also explore the use of neural networks and orthogonal basis expansions to model nonlinearities for general nonparametric models. Extensive empirical study confirms the necessity of nonlinear dependency and the advantage of continuous optimization for score-based learning.

* 17 pages, 5 figures

Via

Access Paper or Ask Questions