Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jerry Li

Sample Efficient Toeplitz Covariance Estimation

May 28, 2019
Yonina C. Eldar, Jerry Li, Cameron Musco, Christopher Musco

Figure 1 for Sample Efficient Toeplitz Covariance Estimation

Figure 2 for Sample Efficient Toeplitz Covariance Estimation

Figure 3 for Sample Efficient Toeplitz Covariance Estimation

Figure 4 for Sample Efficient Toeplitz Covariance Estimation

We study the sample complexity of estimating the covariance matrix $T$ of a distribution $\mathcal{D}$ over $d$-dimensional vectors, under the assumption that $T$ is Toeplitz. This assumption arises in many signal processing problems, where the covariance between any two measurements only depends on the time or distance between those measurements. We are interested in estimation strategies that may choose to view only a subset of entries in each vector sample $x \sim \mathcal{D}$, which often equates to reducing hardware and communication requirements in applications ranging from wireless signal processing to advanced imaging. Our goal is to minimize both 1) the number of vector samples drawn from $\mathcal{D}$ and 2) the number of entries accessed in each sample. We provide some of the first non-asymptotic bounds on these sample complexity measures that exploit $T$'s Toeplitz structure, and by doing so, significantly improve on results for generic covariance matrices. Our bounds follow from a novel analysis of classical and widely used estimation algorithms (along with some new variants), including methods based on selecting entries from each vector sample according to a so-called sparse ruler. In many cases, we pair our upper bounds with matching or nearly matching lower bounds. In addition to results that hold for any Toeplitz $T$, we further study the important setting when $T$ is close to low-rank, which is often the case in practice. We show that methods based on sparse rulers perform even better in this setting, with sample complexity scaling sublinearly in $d$. Motivated by this finding, we develop a new covariance estimation strategy that further improves on all existing methods in the low-rank case: when $T$ is rank-$k$ or nearly rank-$k$, it achieves sample complexity depending polynomially on $k$ and only logarithmically on $d$.

Via

Access Paper or Ask Questions

Spectral Signatures in Backdoor Attacks

Nov 01, 2018
Brandon Tran, Jerry Li, Aleksander Madry

Figure 1 for Spectral Signatures in Backdoor Attacks

Figure 2 for Spectral Signatures in Backdoor Attacks

Figure 3 for Spectral Signatures in Backdoor Attacks

Figure 4 for Spectral Signatures in Backdoor Attacks

A recent line of work has uncovered a new form of data poisoning: so-called \emph{backdoor} attacks. These attacks are particularly dangerous because they do not affect a network's behavior on typical, benign data. Rather, the network only deviates from its expected output when triggered by a perturbation planted by an adversary. In this paper, we identify a new property of all known backdoor attacks, which we call \emph{spectral signatures}. This property allows us to utilize tools from robust statistics to thwart the attacks. We demonstrate the efficacy of these signatures in detecting and removing poisoned examples on real image sets and state of the art neural network architectures. We believe that understanding spectral signatures is a crucial first step towards designing ML systems secure against such backdoor attacks

* 16 pages, accepted to NIPS 2018

Via

Access Paper or Ask Questions

Privately Learning High-Dimensional Distributions

Sep 18, 2018
Gautam Kamath, Jerry Li, Vikrant Singhal, Jonathan Ullman

We present novel, computationally efficient, and differentially private algorithms for two fundamental high-dimensional learning problems: learning a multivariate Gaussian in $R^d$ and learning a product distribution in $\{0,1\}^{d}$ in total variation distance. The sample complexity of our algorithms nearly matches the sample complexity of the optimal non-private learners for these tasks in a wide range of parameters. Thus, our results show that private comes essentially for free for these problems, providing a counterpoint to the many negative results showing that privacy is often costly in high dimensions. Our algorithms introduce a novel technical approach to reducing the sensitivity of the estimation procedure that we call recursive private preconditioning, which may find additional applications.

Via

Access Paper or Ask Questions

Twin-GAN -- Unpaired Cross-Domain Image Translation with Weight-Sharing GANs

Aug 26, 2018
Jerry Li

Figure 1 for Twin-GAN -- Unpaired Cross-Domain Image Translation with Weight-Sharing GANs

Figure 2 for Twin-GAN -- Unpaired Cross-Domain Image Translation with Weight-Sharing GANs

Figure 3 for Twin-GAN -- Unpaired Cross-Domain Image Translation with Weight-Sharing GANs

Figure 4 for Twin-GAN -- Unpaired Cross-Domain Image Translation with Weight-Sharing GANs

We present a framework for translating unlabeled images from one domain into analog images in another domain. We employ a progressively growing skip-connected encoder-generator structure and train it with a GAN loss for realistic output, a cycle consistency loss for maintaining same-domain translation identity, and a semantic consistency loss that encourages the network to keep the input semantic features in the output. We apply our framework on the task of translating face images, and show that it is capable of learning semantic mappings for face images with no supervised one-to-one image mapping.

Via

Access Paper or Ask Questions

On the Limitations of First-Order Approximation in GAN Dynamics

Jun 03, 2018
Jerry Li, Aleksander Madry, John Peebles, Ludwig Schmidt

Figure 1 for On the Limitations of First-Order Approximation in GAN Dynamics

Figure 2 for On the Limitations of First-Order Approximation in GAN Dynamics

Figure 3 for On the Limitations of First-Order Approximation in GAN Dynamics

While Generative Adversarial Networks (GANs) have demonstrated promising performance on multiple vision tasks, their learning dynamics are not yet well understood, both in theory and in practice. To address this issue, we study GAN dynamics in a simple yet rich parametric model that exhibits several of the common problematic convergence behaviors such as vanishing gradients, mode collapse, and diverging or oscillatory behavior. In spite of the non-convex nature of our model, we are able to perform a rigorous theoretical analysis of its convergence behavior. Our analysis reveals an interesting dichotomy: a GAN with an optimal discriminator provably converges, while first order approximations of the discriminator steps lead to unstable GAN dynamics and mode collapse. Our result suggests that using first order discriminator steps (the de-facto standard in most existing GAN setups) might be one of the factors that makes GAN training challenging in practice.

* 18 pages, 4 figures, accepted to ICML 2018

Via

Access Paper or Ask Questions

Byzantine Stochastic Gradient Descent

Mar 23, 2018
Dan Alistarh, Zeyuan Allen-Zhu, Jerry Li

Figure 1 for Byzantine Stochastic Gradient Descent

This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of the $m$ machines which allegedly compute stochastic gradients every iteration, an $\alpha$-fraction are Byzantine, and can behave arbitrarily and adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds $\varepsilon$-approximate minimizers of convex functions in $T = \tilde{O}\big( \frac{1}{\varepsilon^2 m} + \frac{\alpha^2}{\varepsilon^2} \big)$ iterations. In contrast, traditional mini-batch SGD needs $T = O\big( \frac{1}{\varepsilon^2 m} \big)$ iterations, but cannot tolerate Byzantine failures. Further, we provide a lower bound showing that, up to logarithmic factors, our algorithm is information-theoretically optimal both in terms of sampling complexity and time complexity.

Via

Access Paper or Ask Questions

Being Robust (in High Dimensions) Can Be Practical

Mar 13, 2018
Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, Alistair Stewart

Figure 1 for Being Robust (in High Dimensions) Can Be Practical

Figure 2 for Being Robust (in High Dimensions) Can Be Practical

Figure 3 for Being Robust (in High Dimensions) Can Be Practical

Figure 4 for Being Robust (in High Dimensions) Can Be Practical

Robust estimation is much more challenging in high dimensions than it is in one dimension: Most techniques either lead to intractable optimization problems or estimators that can tolerate only a tiny fraction of errors. Recent work in theoretical computer science has shown that, in appropriate distributional models, it is possible to robustly estimate the mean and covariance with polynomial time algorithms that can tolerate a constant fraction of corruptions, independent of the dimension. However, the sample and time complexity of these algorithms is prohibitively large for high-dimensional applications. In this work, we address both of these issues by establishing sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions. Finally, we show on both synthetic and real data that our algorithms have state-of-the-art performance and suddenly make high-dimensional robust estimation a realistic possibility.

* Appeared in ICML 2017

Via

Access Paper or Ask Questions

Sever: A Robust Meta-Algorithm for Stochastic Optimization

Mar 07, 2018
Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Jacob Steinhardt, Alistair Stewart

Figure 1 for Sever: A Robust Meta-Algorithm for Stochastic Optimization

Figure 2 for Sever: A Robust Meta-Algorithm for Stochastic Optimization

Figure 3 for Sever: A Robust Meta-Algorithm for Stochastic Optimization

Figure 4 for Sever: A Robust Meta-Algorithm for Stochastic Optimization

In high dimensions, most machine learning methods are brittle to even a small fraction of structured outliers. To address this, we introduce a new meta-algorithm that can take in a base learner such as least squares or stochastic gradient descent, and harden the learner to be resistant to outliers. Our method, Sever, possesses strong theoretical guarantees yet is also highly scalable -- beyond running the base learner itself, it only requires computing the top singular vector of a certain $n \times d$ matrix. We apply Sever on a drug design dataset and a spam classification dataset, and find that in both cases it has substantially greater robustness than several baselines. On the spam dataset, with $1\%$ corruptions, we achieved $7.4\%$ test error, compared to $13.4\%-20.5\%$ for the baselines, and $3\%$ error on the uncorrupted dataset. Similarly, on the drug design dataset, with $10\%$ corruptions, we achieved $1.42$ mean-squared error test error, compared to $1.51$-$2.33$ for the baselines, and $1.23$ error on the uncorrupted dataset.

Via

Access Paper or Ask Questions

Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms

Feb 23, 2018
Ilias Diakonikolas, Jerry Li, Ludwig Schmidt

Figure 1 for Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms

We study the problem of robustly learning multi-dimensional histograms. A $d$-dimensional function $h: D \rightarrow \mathbb{R}$ is called a $k$-histogram if there exists a partition of the domain $D \subseteq \mathbb{R}^d$ into $k$ axis-aligned rectangles such that $h$ is constant within each such rectangle. Let $f: D \rightarrow \mathbb{R}$ be a $d$-dimensional probability density function and suppose that $f$ is $\mathrm{OPT}$-close, in $L_1$-distance, to an unknown $k$-histogram (with unknown partition). Our goal is to output a hypothesis that is $O(\mathrm{OPT}) + \epsilon$ close to $f$, in $L_1$-distance. We give an algorithm for this learning problem that uses $n = \tilde{O}_d(k/\epsilon^2)$ samples and runs in time $\tilde{O}_d(n)$. For any fixed dimension, our algorithm has optimal sample complexity, up to logarithmic factors, and runs in near-linear time. Prior to our work, the time complexity of the $d=1$ case was well-understood, but significant gaps in our understanding remained even for $d=2$.

Via

Access Paper or Ask Questions