Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tom Goldstein

What do Vision Transformers Learn? A Visual Exploration

Dec 13, 2022
Amin Ghiasi, Hamid Kazemi, Eitan Borgnia, Steven Reich, Manli Shu, Micah Goldblum, Andrew Gordon Wilson, Tom Goldstein

Figure 1 for What do Vision Transformers Learn? A Visual Exploration

Figure 2 for What do Vision Transformers Learn? A Visual Exploration

Figure 3 for What do Vision Transformers Learn? A Visual Exploration

Figure 4 for What do Vision Transformers Learn? A Visual Exploration

Vision transformers (ViTs) are quickly becoming the de-facto architecture for computer vision, yet we understand very little about why they work and what they learn. While existing studies visually analyze the mechanisms of convolutional neural networks, an analogous exploration of ViTs remains challenging. In this paper, we first address the obstacles to performing visualizations on ViTs. Assisted by these solutions, we observe that neurons in ViTs trained with language model supervision (e.g., CLIP) are activated by semantic concepts rather than visual features. We also explore the underlying differences between ViTs and CNNs, and we find that transformers detect image background features, just like their convolutional counterparts, but their predictions depend far less on high-frequency information. On the other hand, both architecture types behave similarly in the way features progress from abstract patterns in early layers to concrete objects in late layers. In addition, we show that ViTs maintain spatial information in all layers except the final layer. In contrast to previous works, we show that the last layer most likely discards the spatial information and behaves as a learned global pooling operation. Finally, we conduct large-scale visualizations on a wide range of ViT variants, including DeiT, CoaT, ConViT, PiT, Swin, and Twin, to validate the effectiveness of our method.

Via

Access Paper or Ask Questions

Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models

Dec 12, 2022
Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, Tom Goldstein

Figure 1 for Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models

Figure 2 for Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models

Figure 3 for Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models

Figure 4 for Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models

Cutting-edge diffusion models produce images with high quality and customizability, enabling them to be used for commercial art and graphic design purposes. But do diffusion models create unique works of art, or are they replicating content directly from their training sets? In this work, we study image retrieval frameworks that enable us to compare generated images with training samples and detect when content has been replicated. Applying our frameworks to diffusion models trained on multiple datasets including Oxford flowers, Celeb-A, ImageNet, and LAION, we discuss how factors such as training set size impact rates of content replication. We also identify cases where diffusion models, including the popular Stable Diffusion model, blatantly copy from their training data.

* Updated draft with the following changes (1) Clarified the LAION Aesthetics versions everywhere (2) Correction on which LAION Aesthetics version SD - 1.4 is finetuned on and updated figure 12 based on this (3) A section on possible causes of replication

Via

Access Paper or Ask Questions

Robustness Disparities in Face Detection

Nov 29, 2022
Samuel Dooley, George Z. Wei, Tom Goldstein, John P. Dickerson

Figure 1 for Robustness Disparities in Face Detection

Figure 2 for Robustness Disparities in Face Detection

Figure 3 for Robustness Disparities in Face Detection

Figure 4 for Robustness Disparities in Face Detection

Facial analysis systems have been deployed by large companies and critiqued by scholars and activists for the past decade. Many existing algorithmic audits examine the performance of these systems on later stage elements of facial analysis systems like facial recognition and age, emotion, or perceived gender prediction; however, a core component to these systems has been vastly understudied from a fairness perspective: face detection, sometimes called face localization. Since face detection is a pre-requisite step in facial analysis systems, the bias we observe in face detection will flow downstream to the other components like facial recognition and emotion prediction. Additionally, no prior work has focused on the robustness of these systems under various perturbations and corruptions, which leaves open the question of how various people are impacted by these phenomena. We present the first of its kind detailed benchmark of face detection systems, specifically examining the robustness to noise of commercial and academic models. We use both standard and recently released academic facial datasets to quantitatively analyze trends in face detection robustness. Across all the datasets and systems, we generally find that photos of individuals who are $\textit{masculine presenting}$, $\textit{older}$, of $\textit{darker skin type}$, or have $\textit{dim lighting}$ are more susceptible to errors than their counterparts in other identities.

* NeurIPS Datasets & Benchmarks Track 2022

Via

Access Paper or Ask Questions

K-SAM: Sharpness-Aware Minimization at the Speed of SGD

Oct 23, 2022
Renkun Ni, Ping-yeh Chiang, Jonas Geiping, Micah Goldblum, Andrew Gordon Wilson, Tom Goldstein

Figure 1 for K-SAM: Sharpness-Aware Minimization at the Speed of SGD

Figure 2 for K-SAM: Sharpness-Aware Minimization at the Speed of SGD

Figure 3 for K-SAM: Sharpness-Aware Minimization at the Speed of SGD

Figure 4 for K-SAM: Sharpness-Aware Minimization at the Speed of SGD

Sharpness-Aware Minimization (SAM) has recently emerged as a robust technique for improving the accuracy of deep neural networks. However, SAM incurs a high computational cost in practice, requiring up to twice as much computation as vanilla SGD. The computational challenge posed by SAM arises because each iteration requires both ascent and descent steps and thus double the gradient computations. To address this challenge, we propose to compute gradients in both stages of SAM on only the top-k samples with highest loss. K-SAM is simple and extremely easy-to-implement while providing significant generalization boosts over vanilla SGD at little to no additional cost.

* 13 pages, 2 figures

Via

Access Paper or Ask Questions

Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries

Oct 19, 2022
Yuxin Wen, Arpit Bansal, Hamid Kazemi, Eitan Borgnia, Micah Goldblum, Jonas Geiping, Tom Goldstein

Figure 1 for Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries

Figure 2 for Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries

Figure 3 for Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries

Figure 4 for Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries

As industrial applications are increasingly automated by machine learning models, enforcing personal data ownership and intellectual property rights requires tracing training data back to their rightful owners. Membership inference algorithms approach this problem by using statistical techniques to discern whether a target sample was included in a model's training set. However, existing methods only utilize the unaltered target sample or simple augmentations of the target to compute statistics. Such a sparse sampling of the model's behavior carries little information, leading to poor inference capabilities. In this work, we use adversarial tools to directly optimize for queries that are discriminative and diverse. Our improvements achieve significantly more accurate membership inference than existing methods, especially in offline scenarios and in the low false-positive regime which is critical in legal settings. Code is available at https://github.com/YuxinWenRick/canary-in-a-coalmine.

* Code is available at https://github.com/YuxinWenRick/canary-in-a-coalmine

Via

Access Paper or Ask Questions

Thinking Two Moves Ahead: Anticipating Other Users Improves Backdoor Attacks in Federated Learning

Oct 17, 2022
Yuxin Wen, Jonas Geiping, Liam Fowl, Hossein Souri, Rama Chellappa, Micah Goldblum, Tom Goldstein

Figure 1 for Thinking Two Moves Ahead: Anticipating Other Users Improves Backdoor Attacks in Federated Learning

Figure 2 for Thinking Two Moves Ahead: Anticipating Other Users Improves Backdoor Attacks in Federated Learning

Figure 3 for Thinking Two Moves Ahead: Anticipating Other Users Improves Backdoor Attacks in Federated Learning

Figure 4 for Thinking Two Moves Ahead: Anticipating Other Users Improves Backdoor Attacks in Federated Learning

Federated learning is particularly susceptible to model poisoning and backdoor attacks because individual users have direct control over the training data and model updates. At the same time, the attack power of an individual user is limited because their updates are quickly drowned out by those of many other users. Existing attacks do not account for future behaviors of other users, and thus require many sequential updates and their effects are quickly erased. We propose an attack that anticipates and accounts for the entire federated learning pipeline, including behaviors of other clients, and ensures that backdoors are effective quickly and persist even after multiple rounds of community updates. We show that this new attack is effective in realistic scenarios where the attacker only contributes to a small fraction of randomly sampled rounds and demonstrate this attack on image classification, next-word prediction, and sentiment analysis.

* Code is available at \url{https://github.com/YuxinWenRick/thinking-two-moves-ahead}

Via

Access Paper or Ask Questions

How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization

Oct 12, 2022
Jonas Geiping, Micah Goldblum, Gowthami Somepalli, Ravid Shwartz-Ziv, Tom Goldstein, Andrew Gordon Wilson

Figure 1 for How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization

Figure 2 for How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization

Figure 3 for How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization

Figure 4 for How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization

Despite the clear performance benefits of data augmentations, little is known about why they are so effective. In this paper, we disentangle several key mechanisms through which data augmentations operate. Establishing an exchange rate between augmented and additional real data, we find that in out-of-distribution testing scenarios, augmentations which yield samples that are diverse, but inconsistent with the data distribution can be even more valuable than additional training data. Moreover, we find that data augmentations which encourage invariances can be more valuable than invariance alone, especially on small and medium sized training sets. Following this observation, we show that augmentations induce additional stochasticity during training, effectively flattening the loss landscape.

* 27 pages, 20 figures. Code at https://github.com/JonasGeiping/dataaugs

Via

Access Paper or Ask Questions

Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models

Sep 15, 2022
Manli Shu, Weili Nie, De-An Huang, Zhiding Yu, Tom Goldstein, Anima Anandkumar, Chaowei Xiao

Figure 1 for Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models

Figure 2 for Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models

Figure 3 for Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models

Figure 4 for Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models

Pre-trained vision-language models (e.g., CLIP) have shown promising zero-shot generalization in many downstream tasks with properly designed text prompts. Instead of relying on hand-engineered prompts, recent works learn prompts using the training data from downstream tasks. While effective, training on domain-specific data reduces a model's generalization capability to unseen new domains. In this work, we propose test-time prompt tuning (TPT), a method that can learn adaptive prompts on the fly with a single test sample. For image classification, TPT optimizes the prompt by minimizing the entropy with confidence selection so that the model has consistent predictions across different augmented views of each test sample. In evaluating generalization to natural distribution shifts, TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average, surpassing previous prompt tuning approaches that require additional task-specific training data. In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data. Project page: https://azshue.github.io/TPT.

* NeurIPS 2022

Via

Access Paper or Ask Questions

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

Aug 19, 2022
Arpit Bansal, Eitan Borgnia, Hong-Min Chu, Jie S. Li, Hamid Kazemi, Furong Huang, Micah Goldblum, Jonas Geiping, Tom Goldstein

Figure 1 for Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

Figure 2 for Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

Figure 3 for Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

Figure 4 for Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

Standard diffusion models involve an image transform -- adding Gaussian noise -- and an image restoration operator that inverts this degradation. We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice. Even when using completely deterministic degradations (e.g., blur, masking, and more), the training and test-time update rules that underlie diffusion models can be easily generalized to create generative models. The success of these fully deterministic models calls into question the community's understanding of diffusion models, which relies on noise in either gradient Langevin dynamics or variational inference, and paves the way for generalized diffusion models that invert arbitrary processes. Our code is available at https://github.com/arpitbansal297/Cold-Diffusion-Models

Via

Access Paper or Ask Questions