Picture for Usha Bhalla

Usha Bhalla

Interpretability Illusions with Sparse Autoencoders: Evaluating Robustness of Concept Representations

Add code
May 21, 2025
Viaarxiv icon

Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution

Add code
Jan 31, 2025
Figure 1 for Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Figure 2 for Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Figure 3 for Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Figure 4 for Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Viaarxiv icon

Towards Unifying Interpretability and Control: Evaluation via Intervention

Add code
Nov 07, 2024
Figure 1 for Towards Unifying Interpretability and Control: Evaluation via Intervention
Figure 2 for Towards Unifying Interpretability and Control: Evaluation via Intervention
Figure 3 for Towards Unifying Interpretability and Control: Evaluation via Intervention
Figure 4 for Towards Unifying Interpretability and Control: Evaluation via Intervention
Viaarxiv icon

All Roads Lead to Rome? Exploring Representational Similarities Between Latent Spaces of Generative Image Models

Add code
Jul 18, 2024
Figure 1 for All Roads Lead to Rome? Exploring Representational Similarities Between Latent Spaces of Generative Image Models
Figure 2 for All Roads Lead to Rome? Exploring Representational Similarities Between Latent Spaces of Generative Image Models
Figure 3 for All Roads Lead to Rome? Exploring Representational Similarities Between Latent Spaces of Generative Image Models
Figure 4 for All Roads Lead to Rome? Exploring Representational Similarities Between Latent Spaces of Generative Image Models
Viaarxiv icon

Operationalizing the Blueprint for an AI Bill of Rights: Recommendations for Practitioners, Researchers, and Policy Makers

Add code
Jul 11, 2024
Viaarxiv icon

Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)

Add code
Feb 16, 2024
Figure 1 for Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)
Figure 2 for Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)
Figure 3 for Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)
Figure 4 for Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)
Viaarxiv icon

Verifiable Feature Attributions: A Bridge between Post Hoc Explainability and Inherent Interpretability

Add code
Jul 27, 2023
Viaarxiv icon

Do Vision-Language Pretrained Models Learn Primitive Concepts?

Add code
Mar 31, 2022
Figure 1 for Do Vision-Language Pretrained Models Learn Primitive Concepts?
Figure 2 for Do Vision-Language Pretrained Models Learn Primitive Concepts?
Figure 3 for Do Vision-Language Pretrained Models Learn Primitive Concepts?
Figure 4 for Do Vision-Language Pretrained Models Learn Primitive Concepts?
Viaarxiv icon