Picture for Joshua Kazdan

Joshua Kazdan

Internal Data Repetition Destroys Language Models

Add code
Jun 23, 2026
Viaarxiv icon

Quantifying the Effect of Test Set Contamination on Generative Evaluations

Add code
Jan 07, 2026
Viaarxiv icon

Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed

Add code
Oct 01, 2025
Figure 1 for Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed
Figure 2 for Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed
Figure 3 for Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed
Figure 4 for Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed
Viaarxiv icon

Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models

Add code
Jun 16, 2025
Figure 1 for Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models
Figure 2 for Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models
Figure 3 for Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models
Figure 4 for Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models
Viaarxiv icon

Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF

Add code
Mar 28, 2025
Viaarxiv icon

Position: Model Collapse Does Not Mean What You Think

Add code
Mar 05, 2025
Figure 1 for Position: Model Collapse Does Not Mean What You Think
Figure 2 for Position: Model Collapse Does Not Mean What You Think
Figure 3 for Position: Model Collapse Does Not Mean What You Think
Figure 4 for Position: Model Collapse Does Not Mean What You Think
Viaarxiv icon

No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data

Add code
Feb 26, 2025
Figure 1 for No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data
Figure 2 for No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data
Figure 3 for No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data
Figure 4 for No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data
Viaarxiv icon

How Do Large Language Monkeys Get Their Power (Laws)?

Add code
Feb 24, 2025
Figure 1 for How Do Large Language Monkeys Get Their Power (Laws)?
Figure 2 for How Do Large Language Monkeys Get Their Power (Laws)?
Figure 3 for How Do Large Language Monkeys Get Their Power (Laws)?
Figure 4 for How Do Large Language Monkeys Get Their Power (Laws)?
Viaarxiv icon

The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning

Add code
Dec 12, 2024
Figure 1 for The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning
Figure 2 for The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning
Figure 3 for The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning
Figure 4 for The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning
Viaarxiv icon

Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World

Add code
Oct 22, 2024
Figure 1 for Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World
Figure 2 for Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World
Figure 3 for Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World
Figure 4 for Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World
Viaarxiv icon