Picture for Joshua Kazdan

Joshua Kazdan

Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models

Add code
Jun 16, 2025
Viaarxiv icon

Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF

Add code
Mar 28, 2025
Viaarxiv icon

Position: Model Collapse Does Not Mean What You Think

Add code
Mar 05, 2025
Viaarxiv icon

No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data

Add code
Feb 26, 2025
Figure 1 for No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data
Figure 2 for No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data
Figure 3 for No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data
Figure 4 for No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data
Viaarxiv icon

How Do Large Language Monkeys Get Their Power (Laws)?

Add code
Feb 24, 2025
Viaarxiv icon

The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning

Add code
Dec 12, 2024
Figure 1 for The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning
Figure 2 for The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning
Figure 3 for The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning
Figure 4 for The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning
Viaarxiv icon

Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World

Add code
Oct 22, 2024
Viaarxiv icon

CPSample: Classifier Protected Sampling for Guarding Training Data During Diffusion

Add code
Sep 11, 2024
Viaarxiv icon