Picture for Nikhil Kandpal

Nikhil Kandpal

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Add code
Jun 05, 2025
Viaarxiv icon

Enhancing Training Data Attribution with Representational Optimization

Add code
May 24, 2025
Viaarxiv icon

Position: The Most Expensive Part of an LLM should be its Training Data

Add code
Apr 16, 2025
Viaarxiv icon

Efficient Model Development through Fine-tuning Transfer

Add code
Mar 25, 2025
Viaarxiv icon

AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution

Add code
Nov 22, 2024
Figure 1 for AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution
Figure 2 for AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution
Figure 3 for AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution
Figure 4 for AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution
Viaarxiv icon

User Inference Attacks on Large Language Models

Add code
Oct 13, 2023
Figure 1 for User Inference Attacks on Large Language Models
Figure 2 for User Inference Attacks on Large Language Models
Figure 3 for User Inference Attacks on Large Language Models
Figure 4 for User Inference Attacks on Large Language Models
Viaarxiv icon

Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models

Add code
Jun 07, 2023
Viaarxiv icon

Large Language Models Struggle to Learn Long-Tail Knowledge

Add code
Nov 15, 2022
Viaarxiv icon

Music Enhancement via Image Translation and Vocoding

Add code
Apr 28, 2022
Figure 1 for Music Enhancement via Image Translation and Vocoding
Figure 2 for Music Enhancement via Image Translation and Vocoding
Figure 3 for Music Enhancement via Image Translation and Vocoding
Figure 4 for Music Enhancement via Image Translation and Vocoding
Viaarxiv icon

Deduplicating Training Data Mitigates Privacy Risks in Language Models

Add code
Feb 16, 2022
Figure 1 for Deduplicating Training Data Mitigates Privacy Risks in Language Models
Figure 2 for Deduplicating Training Data Mitigates Privacy Risks in Language Models
Figure 3 for Deduplicating Training Data Mitigates Privacy Risks in Language Models
Figure 4 for Deduplicating Training Data Mitigates Privacy Risks in Language Models
Viaarxiv icon