Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tom Goldstein

Transfer Learning with Deep Tabular Models

Jun 30, 2022

Roman Levin, Valeriia Cherepanova, Avi Schwarzschild, Arpit Bansal, C. Bayan Bruss, Tom Goldstein, Andrew Gordon Wilson, Micah Goldblum

Figure 1 for Transfer Learning with Deep Tabular Models

Figure 2 for Transfer Learning with Deep Tabular Models

Figure 3 for Transfer Learning with Deep Tabular Models

Figure 4 for Transfer Learning with Deep Tabular Models

Abstract:Recent work on deep learning for tabular data demonstrates the strong performance of deep tabular models, often bridging the gap between gradient boosted decision trees and neural networks. Accuracy aside, a major advantage of neural models is that they learn reusable features and are easily fine-tuned in new domains. This property is often exploited in computer vision and natural language applications, where transfer learning is indispensable when task-specific training data is scarce. In this work, we demonstrate that upstream data gives tabular neural networks a decisive advantage over widely used GBDT models. We propose a realistic medical diagnosis benchmark for tabular transfer learning, and we present a how-to guide for using upstream data to boost performance with a variety of tabular neural network architectures. Finally, we propose a pseudo-feature method for cases where the upstream and downstream feature sets differ, a tabular-specific problem widespread in real-world applications. Our code is available at https://github.com/LevinRoman/tabular-transfer-learning .

Via

Access Paper or Ask Questions

A Robust Stacking Framework for Training Deep Graph Models with Multifaceted Node Features

Jun 16, 2022

Jiuhai Chen, Jonas Mueller, Vassilis N. Ioannidis, Tom Goldstein, David Wipf

Figure 1 for A Robust Stacking Framework for Training Deep Graph Models with Multifaceted Node Features

Figure 2 for A Robust Stacking Framework for Training Deep Graph Models with Multifaceted Node Features

Figure 3 for A Robust Stacking Framework for Training Deep Graph Models with Multifaceted Node Features

Figure 4 for A Robust Stacking Framework for Training Deep Graph Models with Multifaceted Node Features

Abstract:Graph Neural Networks (GNNs) with numerical node features and graph structure as inputs have demonstrated superior performance on various supervised learning tasks with graph data. However the numerical node features utilized by GNNs are commonly extracted from raw data which is of text or tabular (numeric/categorical) type in most real-world applications. The best models for such data types in most standard supervised learning settings with IID (non-graph) data are not simple neural network layers and thus are not easily incorporated into a GNN. Here we propose a robust stacking framework that fuses graph-aware propagation with arbitrary models intended for IID data, which are ensembled and stacked in multiple layers. Our layer-wise framework leverages bagging and stacking strategies to enjoy strong generalization, in a manner which effectively mitigates label leakage and overfitting. Across a variety of graph datasets with tabular/text node features, our method achieves comparable or superior performance relative to both tabular/text and graph neural network models, as well as existing state-of-the-art hybrid strategies that combine the two.

Via

Access Paper or Ask Questions

Autoregressive Perturbations for Data Poisoning

Jun 15, 2022

Pedro Sandoval-Segura, Vasu Singla, Jonas Geiping, Micah Goldblum, Tom Goldstein, David W. Jacobs

Figure 1 for Autoregressive Perturbations for Data Poisoning

Figure 2 for Autoregressive Perturbations for Data Poisoning

Figure 3 for Autoregressive Perturbations for Data Poisoning

Figure 4 for Autoregressive Perturbations for Data Poisoning

Abstract:The prevalence of data scraping from social media as a means to obtain datasets has led to growing concerns regarding unauthorized use of data. Data poisoning attacks have been proposed as a bulwark against scraping, as they make data "unlearnable" by adding small, imperceptible perturbations. Unfortunately, existing methods require knowledge of both the target architecture and the complete dataset so that a surrogate network can be trained, the parameters of which are used to generate the attack. In this work, we introduce autoregressive (AR) poisoning, a method that can generate poisoned data without access to the broader dataset. The proposed AR perturbations are generic, can be applied across different datasets, and can poison different architectures. Compared to existing unlearnable methods, our AR poisons are more resistant against common defenses such as adversarial training and strong data augmentations. Our analysis further provides insight into what makes an effective data poison.

* 22 pages, 13 figures. Code available at https://github.com/psandovalsegura/autoregressive-poisoning

Via

Access Paper or Ask Questions

Poisons that are learned faster are more effective

Apr 19, 2022

Pedro Sandoval-Segura, Vasu Singla, Liam Fowl, Jonas Geiping, Micah Goldblum, David Jacobs, Tom Goldstein

Figure 1 for Poisons that are learned faster are more effective

Figure 2 for Poisons that are learned faster are more effective

Figure 3 for Poisons that are learned faster are more effective

Figure 4 for Poisons that are learned faster are more effective

Abstract:Imperceptible poisoning attacks on entire datasets have recently been touted as methods for protecting data privacy. However, among a number of defenses preventing the practical use of these techniques, early-stopping stands out as a simple, yet effective defense. To gauge poisons' vulnerability to early-stopping, we benchmark error-minimizing, error-maximizing, and synthetic poisons in terms of peak test accuracy over 100 epochs and make a number of surprising observations. First, we find that poisons that reach a low training loss faster have lower peak test accuracy. Second, we find that a current state-of-the-art error-maximizing poison is 7 times less effective when poison training is stopped at epoch 8. Third, we find that stronger, more transferable adversarial attacks do not make stronger poisons. We advocate for evaluating poisons in terms of peak test accuracy.

* 8 pages, 4 figures. Accepted to CVPR 2022 Art of Robustness Workshop

Via

Access Paper or Ask Questions

A Deep Dive into Dataset Imbalance and Bias in Face Identification

Mar 15, 2022

Valeriia Cherepanova, Steven Reich, Samuel Dooley, Hossein Souri, Micah Goldblum, Tom Goldstein

Figure 1 for A Deep Dive into Dataset Imbalance and Bias in Face Identification

Figure 2 for A Deep Dive into Dataset Imbalance and Bias in Face Identification

Figure 3 for A Deep Dive into Dataset Imbalance and Bias in Face Identification

Figure 4 for A Deep Dive into Dataset Imbalance and Bias in Face Identification

Abstract:As the deployment of automated face recognition (FR) systems proliferates, bias in these systems is not just an academic question, but a matter of public concern. Media portrayals often center imbalance as the main source of bias, i.e., that FR models perform worse on images of non-white people or women because these demographic groups are underrepresented in training data. Recent academic research paints a more nuanced picture of this relationship. However, previous studies of data imbalance in FR have focused exclusively on the face verification setting, while the face identification setting has been largely ignored, despite being deployed in sensitive applications such as law enforcement. This is an unfortunate omission, as 'imbalance' is a more complex matter in identification; imbalance may arise in not only the training data, but also the testing data, and furthermore may affect the proportion of identities belonging to each demographic group or the number of images belonging to each identity. In this work, we address this gap in the research by thoroughly exploring the effects of each kind of imbalance possible in face identification, and discuss other factors which may impact bias in this setting.

Via

Access Paper or Ask Questions

Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective

Mar 15, 2022

Gowthami Somepalli, Liam Fowl, Arpit Bansal, Ping Yeh-Chiang, Yehuda Dar, Richard Baraniuk, Micah Goldblum, Tom Goldstein

Figure 1 for Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective

Figure 2 for Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective

Figure 3 for Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective

Figure 4 for Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective

Abstract:We discuss methods for visualizing neural network decision boundaries and decision regions. We use these visualizations to investigate issues related to reproducibility and generalization in neural network training. We observe that changes in model architecture (and its associate inductive bias) cause visible changes in decision boundaries, while multiple runs with the same architecture yield results with strong similarities, especially in the case of wide architectures. We also use decision boundary methods to visualize double descent phenomena. We see that decision boundary reproducibility depends strongly on model width. Near the threshold of interpolation, neural network decision boundaries become fragmented into many small decision regions, and these regions are non-reproducible. Meanwhile, very narrows and very wide networks have high levels of reproducibility in their decision boundaries with relatively few decision regions. We discuss how our observations relate to the theory of double descent phenomena in convex models. Code is available at https://github.com/somepago/dbViz

* To appear in CVPR 2022

Via

Access Paper or Ask Questions

End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking

Feb 15, 2022

Arpit Bansal, Avi Schwarzschild, Eitan Borgnia, Zeyad Emam, Furong Huang, Micah Goldblum, Tom Goldstein

Figure 1 for End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking

Figure 2 for End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking

Figure 3 for End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking

Figure 4 for End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking

Abstract:Machine learning systems perform well on pattern matching tasks, but their ability to perform algorithmic or logical reasoning is not well understood. One important reasoning capability is logical extrapolation, in which models trained only on small/simple reasoning problems can synthesize complex algorithms that scale up to large/complex problems at test time. Logical extrapolation can be achieved through recurrent systems, which can be iterated many times to solve difficult reasoning problems. We observe that this approach fails to scale to highly complex problems because behavior degenerates when many iterations are applied -- an issue we refer to as "overthinking." We propose a recall architecture that keeps an explicit copy of the problem instance in memory so that it cannot be forgotten. We also employ a progressive training routine that prevents the model from learning behaviors that are specific to iteration number and instead pushes it to learn behaviors that can be repeated indefinitely. These innovations prevent the overthinking problem, and enable recurrent systems to solve extremely hard logical extrapolation tasks, some requiring over 100K convolutional layers, without overthinking.

Via

Access Paper or Ask Questions

Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification

Feb 01, 2022

Yuxin Wen, Jonas Geiping, Liam Fowl, Micah Goldblum, Tom Goldstein

Figure 1 for Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification

Figure 2 for Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification

Figure 3 for Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification

Figure 4 for Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification

Abstract:Federated learning (FL) has rapidly risen in popularity due to its promise of privacy and efficiency. Previous works have exposed privacy vulnerabilities in the FL pipeline by recovering user data from gradient updates. However, existing attacks fail to address realistic settings because they either 1) require a `toy' settings with very small batch sizes, or 2) require unrealistic and conspicuous architecture modifications. We introduce a new strategy that dramatically elevates existing attacks to operate on batches of arbitrarily large size, and without architectural modifications. Our model-agnostic strategy only requires modifications to the model parameters sent to the user, which is a realistic threat model in many scenarios. We demonstrate the strategy in challenging large-scale settings, obtaining high-fidelity data extraction in both cross-device and cross-silo federated learning.

* First three authors contributed equally, order chosen randomly. 21 pages, 9 figures

Via

Access Paper or Ask Questions

Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations

Jan 31, 2022

Amin Ghiasi, Hamid Kazemi, Steven Reich, Chen Zhu, Micah Goldblum, Tom Goldstein

Figure 1 for Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations

Figure 2 for Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations

Figure 3 for Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations

Figure 4 for Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations

Abstract:Existing techniques for model inversion typically rely on hard-to-tune regularizers, such as total variation or feature regularization, which must be individually calibrated for each network in order to produce adequate images. In this work, we introduce Plug-In Inversion, which relies on a simple set of augmentations and does not require excessive hyper-parameter tuning. Under our proposed augmentation-based scheme, the same set of augmentation hyper-parameters can be used for inverting a wide range of image classification models, regardless of input dimensions or the architecture. We illustrate the practicality of our approach by inverting Vision Transformers (ViTs) and Multi-Layer Perceptrons (MLPs) trained on the ImageNet dataset, tasks which to the best of our knowledge have not been successfully accomplished by any previous works.

Via

Access Paper or Ask Questions

Decepticons: Corrupted Transformers Breach Privacy in Federated Learning for Language Models

Jan 29, 2022

Liam Fowl, Jonas Geiping, Steven Reich, Yuxin Wen, Wojtek Czaja, Micah Goldblum, Tom Goldstein

Figure 1 for Decepticons: Corrupted Transformers Breach Privacy in Federated Learning for Language Models

Figure 2 for Decepticons: Corrupted Transformers Breach Privacy in Federated Learning for Language Models

Figure 3 for Decepticons: Corrupted Transformers Breach Privacy in Federated Learning for Language Models

Figure 4 for Decepticons: Corrupted Transformers Breach Privacy in Federated Learning for Language Models

Abstract:A central tenet of Federated learning (FL), which trains models without centralizing user data, is privacy. However, previous work has shown that the gradient updates used in FL can leak user information. While the most industrial uses of FL are for text applications (e.g. keystroke prediction), nearly all attacks on FL privacy have focused on simple image classifiers. We propose a novel attack that reveals private user text by deploying malicious parameter vectors, and which succeeds even with mini-batches, multiple users, and long sequences. Unlike previous attacks on FL, the attack exploits characteristics of both the Transformer architecture and the token embedding, separately extracting tokens and positional embeddings to retrieve high-fidelity text. This work suggests that FL on text, which has historically been resistant to privacy attacks, is far more vulnerable than previously thought.

* First two authors contributed equally. Order chosen by coin flip

Via

Access Paper or Ask Questions