Alert button
Picture for Avi Schwarzschild

Avi Schwarzschild

Alert button

Benchmarking ChatGPT on Algorithmic Reasoning

Add code
Bookmark button
Alert button
Apr 04, 2024
Sean McLeish, Avi Schwarzschild, Tom Goldstein

Viaarxiv icon

Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

Add code
Bookmark button
Alert button
Jan 22, 2024
Abhimanyu Hans, Avi Schwarzschild, Valeriia Cherepanova, Hamid Kazemi, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein

Viaarxiv icon

TOFU: A Task of Fictitious Unlearning for LLMs

Add code
Bookmark button
Alert button
Jan 11, 2024
Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C. Lipton, J. Zico Kolter

Viaarxiv icon

Effective Backdoor Mitigation Depends on the Pre-training Objective

Add code
Bookmark button
Alert button
Dec 05, 2023
Sahil Verma, Gantavya Bhatt, Avi Schwarzschild, Soumye Singhal, Arnav Mohanty Das, Chirag Shah, John P Dickerson, Jeff Bilmes

Viaarxiv icon

NEFTune: Noisy Embeddings Improve Instruction Finetuning

Add code
Bookmark button
Alert button
Oct 10, 2023
Neel Jain, Ping-yeh Chiang, Yuxin Wen, John Kirchenbauer, Hong-Min Chu, Gowthami Somepalli, Brian R. Bartoldson, Bhavya Kailkhura, Avi Schwarzschild, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein

Figure 1 for NEFTune: Noisy Embeddings Improve Instruction Finetuning
Figure 2 for NEFTune: Noisy Embeddings Improve Instruction Finetuning
Figure 3 for NEFTune: Noisy Embeddings Improve Instruction Finetuning
Figure 4 for NEFTune: Noisy Embeddings Improve Instruction Finetuning
Viaarxiv icon

Baseline Defenses for Adversarial Attacks Against Aligned Language Models

Add code
Bookmark button
Alert button
Sep 04, 2023
Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping-yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, Tom Goldstein

Figure 1 for Baseline Defenses for Adversarial Attacks Against Aligned Language Models
Figure 2 for Baseline Defenses for Adversarial Attacks Against Aligned Language Models
Figure 3 for Baseline Defenses for Adversarial Attacks Against Aligned Language Models
Figure 4 for Baseline Defenses for Adversarial Attacks Against Aligned Language Models
Viaarxiv icon

A Cookbook of Self-Supervised Learning

Add code
Bookmark button
Alert button
Apr 24, 2023
Randall Balestriero, Mark Ibrahim, Vlad Sobal, Ari Morcos, Shashank Shekhar, Tom Goldstein, Florian Bordes, Adrien Bardes, Gregoire Mialon, Yuandong Tian, Avi Schwarzschild, Andrew Gordon Wilson, Jonas Geiping, Quentin Garrido, Pierre Fernandez, Amir Bar, Hamed Pirsiavash, Yann LeCun, Micah Goldblum

Figure 1 for A Cookbook of Self-Supervised Learning
Figure 2 for A Cookbook of Self-Supervised Learning
Figure 3 for A Cookbook of Self-Supervised Learning
Figure 4 for A Cookbook of Self-Supervised Learning
Viaarxiv icon

Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective

Add code
Bookmark button
Alert button
Mar 23, 2023
Avi Schwarzschild, Max Cembalest, Karthik Rao, Keegan Hines, John Dickerson

Figure 1 for Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective
Figure 2 for Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective
Figure 3 for Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective
Figure 4 for Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective
Viaarxiv icon