Picture for Himabindu Lakkaraju

Himabindu Lakkaraju

All Roads Lead to Rome? Exploring Representational Similarities Between Latent Spaces of Generative Image Models

Add code
Jul 18, 2024
Viaarxiv icon

Operationalizing the Blueprint for an AI Bill of Rights: Recommendations for Practitioners, Researchers, and Policy Makers

Add code
Jul 11, 2024
Viaarxiv icon

On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models

Add code
Jun 15, 2024
Figure 1 for On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models
Figure 2 for On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models
Figure 3 for On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models
Figure 4 for On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models
Viaarxiv icon

Interpretability Needs a New Paradigm

Add code
May 08, 2024
Viaarxiv icon

More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness

Add code
Apr 29, 2024
Viaarxiv icon

Manipulating Large Language Models to Increase Product Visibility

Add code
Apr 11, 2024
Viaarxiv icon

Data Poisoning Attacks on Off-Policy Policy Evaluation Methods

Add code
Apr 06, 2024
Viaarxiv icon

Towards Safe and Aligned Large Language Models for Medicine

Add code
Mar 06, 2024
Figure 1 for Towards Safe and Aligned Large Language Models for Medicine
Figure 2 for Towards Safe and Aligned Large Language Models for Medicine
Viaarxiv icon

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

Add code
Feb 27, 2024
Viaarxiv icon

Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)

Add code
Feb 16, 2024
Figure 1 for Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)
Figure 2 for Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)
Figure 3 for Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)
Figure 4 for Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)
Viaarxiv icon