Alert button
Picture for Minh Pham

Minh Pham

Alert button

Distributionally Robust Classification on a Data Budget

Aug 07, 2023
Benjamin Feuer, Ameya Joshi, Minh Pham, Chinmay Hegde

Figure 1 for Distributionally Robust Classification on a Data Budget
Figure 2 for Distributionally Robust Classification on a Data Budget
Figure 3 for Distributionally Robust Classification on a Data Budget
Figure 4 for Distributionally Robust Classification on a Data Budget

Real world uses of deep learning require predictable model behavior under distribution shifts. Models such as CLIP show emergent natural distributional robustness comparable to humans, but may require hundreds of millions of training samples. Can we train robust learners in a domain where data is limited? To rigorously address this question, we introduce JANuS (Joint Annotations and Names Set), a collection of four new training datasets with images, labels, and corresponding captions, and perform a series of carefully controlled investigations of factors contributing to robustness in image classification, then compare those results to findings derived from a large-scale meta-analysis. Using this approach, we show that standard ResNet-50 trained with the cross-entropy loss on 2.4 million image samples can attain comparable robustness to a CLIP ResNet-50 trained on 400 million samples. To our knowledge, this is the first result showing (near) state-of-the-art distributional robustness on limited data budgets. Our dataset is available at \url{https://huggingface.co/datasets/penfever/JANuS_dataset}, and the code used to reproduce our experiments can be found at \url{https://github.com/penfever/vlhub/}.

* TMLR 2023; openreview link: https://openreview.net/forum?id=D5Z2E8CNsD 
Viaarxiv icon

Circumventing Concept Erasure Methods For Text-to-Image Generative Models

Aug 03, 2023
Minh Pham, Kelly O. Marshall, Chinmay Hegde

Figure 1 for Circumventing Concept Erasure Methods For Text-to-Image Generative Models
Figure 2 for Circumventing Concept Erasure Methods For Text-to-Image Generative Models
Figure 3 for Circumventing Concept Erasure Methods For Text-to-Image Generative Models
Figure 4 for Circumventing Concept Erasure Methods For Text-to-Image Generative Models

Text-to-image generative models can produce photo-realistic images for an extremely broad range of concepts, and their usage has proliferated widely among the general public. On the flip side, these models have numerous drawbacks, including their potential to generate images featuring sexually explicit content, mirror artistic styles without permission, or even hallucinate (or deepfake) the likenesses of celebrities. Consequently, various methods have been proposed in order to "erase" sensitive concepts from text-to-image models. In this work, we examine five recently proposed concept erasure methods, and show that targeted concepts are not fully excised from any of these methods. Specifically, we leverage the existence of special learned word embeddings that can retrieve "erased" concepts from the sanitized models with no alterations to their weights. Our results highlight the brittleness of post hoc concept erasure methods, and call into question their use in the algorithmic toolkit for AI safety.

Viaarxiv icon

ZeroForge: Feedforward Text-to-Shape Without 3D Supervision

Jun 16, 2023
Kelly O. Marshall, Minh Pham, Ameya Joshi, Anushrut Jignasu, Aditya Balu, Adarsh Krishnamurthy, Chinmay Hegde

Figure 1 for ZeroForge: Feedforward Text-to-Shape Without 3D Supervision
Figure 2 for ZeroForge: Feedforward Text-to-Shape Without 3D Supervision
Figure 3 for ZeroForge: Feedforward Text-to-Shape Without 3D Supervision
Figure 4 for ZeroForge: Feedforward Text-to-Shape Without 3D Supervision

Current state-of-the-art methods for text-to-shape generation either require supervised training using a labeled dataset of pre-defined 3D shapes, or perform expensive inference-time optimization of implicit neural representations. In this work, we present ZeroForge, an approach for zero-shot text-to-shape generation that avoids both pitfalls. To achieve open-vocabulary shape generation, we require careful architectural adaptation of existing feed-forward approaches, as well as a combination of data-free CLIP-loss and contrastive losses to avoid mode collapse. Using these techniques, we are able to considerably expand the generative ability of existing feed-forward text-to-shape models such as CLIP-Forge. We support our method via extensive qualitative and quantitative evaluations

* 19 pages, High resolution figures needed to demonstrate 3D results 
Viaarxiv icon

Revisiting Self-Distillation

Jun 17, 2022
Minh Pham, Minsu Cho, Ameya Joshi, Chinmay Hegde

Figure 1 for Revisiting Self-Distillation
Figure 2 for Revisiting Self-Distillation
Figure 3 for Revisiting Self-Distillation
Figure 4 for Revisiting Self-Distillation

Knowledge distillation is the procedure of transferring "knowledge" from a large model (the teacher) to a more compact one (the student), often being used in the context of model compression. When both models have the same architecture, this procedure is called self-distillation. Several works have anecdotally shown that a self-distilled student can outperform the teacher on held-out data. In this work, we systematically study self-distillation in a number of settings. We first show that even with a highly accurate teacher, self-distillation allows a student to surpass the teacher in all cases. Secondly, we revisit existing theoretical explanations of (self) distillation and identify contradicting examples, revealing possible drawbacks of these explanations. Finally, we provide an alternative explanation for the dynamics of self-distillation through the lens of loss landscape geometry. We conduct extensive experiments to show that self-distillation leads to flatter minima, thereby resulting in better generalization.

Viaarxiv icon

Transformer with Fourier Integral Attentions

Jun 01, 2022
Tan Nguyen, Minh Pham, Tam Nguyen, Khai Nguyen, Stanley J. Osher, Nhat Ho

Figure 1 for Transformer with Fourier Integral Attentions
Figure 2 for Transformer with Fourier Integral Attentions
Figure 3 for Transformer with Fourier Integral Attentions
Figure 4 for Transformer with Fourier Integral Attentions

Multi-head attention empowers the recent success of transformers, the state-of-the-art models that have achieved remarkable success in sequence modeling and beyond. These attention mechanisms compute the pairwise dot products between the queries and keys, which results from the use of unnormalized Gaussian kernels with the assumption that the queries follow a mixture of Gaussian distribution. There is no guarantee that this assumption is valid in practice. In response, we first interpret attention in transformers as a nonparametric kernel regression. We then propose the FourierFormer, a new class of transformers in which the dot-product kernels are replaced by the novel generalized Fourier integral kernels. Different from the dot-product kernels, where we need to choose a good covariance matrix to capture the dependency of the features of data, the generalized Fourier integral kernels can automatically capture such dependency and remove the need to tune the covariance matrix. We theoretically prove that our proposed Fourier integral kernels can efficiently approximate any key and query distributions. Compared to the conventional transformers with dot-product attention, FourierFormers attain better accuracy and reduce the redundancy between attention heads. We empirically corroborate the advantages of FourierFormers over the baseline transformers in a variety of practical applications including language modeling and image classification.

* 35 pages, 5 tables. Tan Nguyen and Minh Pham contributed equally to this work 
Viaarxiv icon

Smooth-Reduce: Leveraging Patches for Improved Certified Robustness

May 12, 2022
Ameya Joshi, Minh Pham, Minsu Cho, Leonid Boytsov, Filipe Condessa, J. Zico Kolter, Chinmay Hegde

Figure 1 for Smooth-Reduce: Leveraging Patches for Improved Certified Robustness
Figure 2 for Smooth-Reduce: Leveraging Patches for Improved Certified Robustness
Figure 3 for Smooth-Reduce: Leveraging Patches for Improved Certified Robustness
Figure 4 for Smooth-Reduce: Leveraging Patches for Improved Certified Robustness

Randomized smoothing (RS) has been shown to be a fast, scalable technique for certifying the robustness of deep neural network classifiers. However, methods based on RS require augmenting data with large amounts of noise, which leads to significant drops in accuracy. We propose a training-free, modified smoothing approach, Smooth-Reduce, that leverages patching and aggregation to provide improved classifier certificates. Our algorithm classifies overlapping patches extracted from an input image, and aggregates the predicted logits to certify a larger radius around the input. We study two aggregation schemes -- max and mean -- and show that both approaches provide better certificates in terms of certified accuracy, average certified radii and abstention rates as compared to concurrent approaches. We also provide theoretical guarantees for such certificates, and empirically show significant improvements over other randomized smoothing methods that require expensive retraining. Further, we extend our approach to videos and provide meaningful certificates for video classifiers. A project page can be found at https://nyu-dice-lab.github.io/SmoothReduce/

Viaarxiv icon

Harnessing Geometric Constraints from Auxiliary Labels to Improve Embedding Functions for One-Shot Learning

Mar 05, 2021
Anand Ramakrishnan, Minh Pham, Jacob Whitehill

Figure 1 for Harnessing Geometric Constraints from Auxiliary Labels to Improve Embedding Functions for One-Shot Learning
Figure 2 for Harnessing Geometric Constraints from Auxiliary Labels to Improve Embedding Functions for One-Shot Learning
Figure 3 for Harnessing Geometric Constraints from Auxiliary Labels to Improve Embedding Functions for One-Shot Learning
Figure 4 for Harnessing Geometric Constraints from Auxiliary Labels to Improve Embedding Functions for One-Shot Learning

We explore the utility of harnessing auxiliary labels (e.g., facial expression) to impose geometric structure when training embedding models for one-shot learning (e.g., for face verification). We introduce novel geometric constraints on the embedding space learned by a deep model using either manually annotated or automatically detected auxiliary labels. We contrast their performances (AUC) on four different face datasets(CK+, VGGFace-2, Tufts Face, and PubFig). Due to the additional structure encoded in the embedding space, our methods provide a higher verification accuracy (99.7, 86.2, 99.4, and 79.3% with our proposed TL+PDP+FBV loss, versus 97.5, 72.6, 93.1, and 70.5% using a standard Triplet Loss on the four datasets, respectively). Our method is implemented purely in terms of the loss function. It does not require any changes to the backbone of the embedding functions.

* 8 pages, 3 figures, 2 tables 
Viaarxiv icon

Laplacian Smoothing Gradient Descent

Oct 17, 2018
Stanley Osher, Bao Wang, Penghang Yin, Xiyang Luo, Minh Pham, Alex Lin

Figure 1 for Laplacian Smoothing Gradient Descent
Figure 2 for Laplacian Smoothing Gradient Descent
Figure 3 for Laplacian Smoothing Gradient Descent
Figure 4 for Laplacian Smoothing Gradient Descent

We propose a very simple modification of gradient descent and stochastic gradient descent. We show that when applied to a variety of machine learning models including softmax regression, convolutional neural nets, generative adversarial nets, and deep reinforcement learning, this very simple surrogate can dramatically reduce the variance and improve the accuracy of the generalization. The new algorithm, (which depends on one nonnegative parameter) when applied to non-convex minimization, tends to avoid sharp local minima. Instead it seeks somewhat flatter local (and often global) minima. The method only involves preconditioning the gradient by the inverse of a tri-diagonal matrix that is positive definite. The motivation comes from the theory of Hamilton-Jacobi partial differential equations. This theory demonstrates that the new algorithm is almost the same as doing gradient descent on a new function which (a) has the same global minima as the original function and (b) is "more convex". Again, the programming effort in doing this is minimal, in cost, complexity and effort. We implement our algorithm into both PyTorch and Tensorflow platforms, which will be made publicly available.

* 17 pages, 10 figures 
Viaarxiv icon

Stochastic Backward Euler: An Implicit Gradient Descent Algorithm for $k$-means Clustering

May 21, 2018
Penghang Yin, Minh Pham, Adam Oberman, Stanley Osher

Figure 1 for Stochastic Backward Euler: An Implicit Gradient Descent Algorithm for $k$-means Clustering
Figure 2 for Stochastic Backward Euler: An Implicit Gradient Descent Algorithm for $k$-means Clustering
Figure 3 for Stochastic Backward Euler: An Implicit Gradient Descent Algorithm for $k$-means Clustering
Figure 4 for Stochastic Backward Euler: An Implicit Gradient Descent Algorithm for $k$-means Clustering

In this paper, we propose an implicit gradient descent algorithm for the classic $k$-means problem. The implicit gradient step or backward Euler is solved via stochastic fixed-point iteration, in which we randomly sample a mini-batch gradient in every iteration. It is the average of the fixed-point trajectory that is carried over to the next gradient step. We draw connections between the proposed stochastic backward Euler and the recent entropy stochastic gradient descent (Entropy-SGD) for improving the training of deep neural networks. Numerical experiments on various synthetic and real datasets show that the proposed algorithm provides better clustering results compared to $k$-means algorithms in the sense that it decreased the objective function (the cluster) and is much more robust to initialization.

Viaarxiv icon