Picture for Suraj Srinivas

Suraj Srinivas

How much can we forget about Data Contamination?

Add code
Oct 04, 2024
Viaarxiv icon

All Roads Lead to Rome? Exploring Representational Similarities Between Latent Spaces of Generative Image Models

Add code
Jul 18, 2024
Viaarxiv icon

Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)

Add code
Feb 16, 2024
Viaarxiv icon

Certifying LLM Safety against Adversarial Prompting

Add code
Sep 06, 2023
Viaarxiv icon

Verifiable Feature Attributions: A Bridge between Post Hoc Explainability and Inherent Interpretability

Add code
Jul 27, 2023
Viaarxiv icon

Efficient Estimation of the Local Robustness of Machine Learning Models

Add code
Jul 26, 2023
Viaarxiv icon

Consistent Explanations in the Face of Model Indeterminacy via Ensembling

Add code
Jun 13, 2023
Viaarxiv icon

On Minimizing the Impact of Dataset Shifts on Actionable Explanations

Add code
Jun 11, 2023
Viaarxiv icon

Word-Level Explanations for Analyzing Bias in Text-to-Image Models

Add code
Jun 03, 2023
Viaarxiv icon

Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness

Add code
May 30, 2023
Viaarxiv icon