Alert button
Picture for Chirag Agarwal

Chirag Agarwal

Alert button

Towards Safe and Aligned Large Language Models for Medicine

Add code
Bookmark button
Alert button
Mar 06, 2024
Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju

Figure 1 for Towards Safe and Aligned Large Language Models for Medicine
Figure 2 for Towards Safe and Aligned Large Language Models for Medicine
Viaarxiv icon

Understanding the Effects of Iterative Prompting on Truthfulness

Add code
Bookmark button
Alert button
Feb 09, 2024
Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

Viaarxiv icon

Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

Add code
Bookmark button
Alert button
Feb 08, 2024
Chirag Agarwal, Sree Harsha Tanneru, Himabindu Lakkaraju

Viaarxiv icon

Quantifying Uncertainty in Natural Language Explanations of Large Language Models

Add code
Bookmark button
Alert button
Nov 06, 2023
Sree Harsha Tanneru, Chirag Agarwal, Himabindu Lakkaraju

Viaarxiv icon

Are Large Language Models Post Hoc Explainers?

Add code
Bookmark button
Alert button
Oct 10, 2023
Nicholas Kroeger, Dan Ley, Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

Figure 1 for Are Large Language Models Post Hoc Explainers?
Figure 2 for Are Large Language Models Post Hoc Explainers?
Figure 3 for Are Large Language Models Post Hoc Explainers?
Figure 4 for Are Large Language Models Post Hoc Explainers?
Viaarxiv icon

On the Trade-offs between Adversarial Robustness and Actionable Explanations

Add code
Bookmark button
Alert button
Sep 28, 2023
Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

Viaarxiv icon

Certifying LLM Safety against Adversarial Prompting

Add code
Bookmark button
Alert button
Sep 06, 2023
Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Soheil Feizi, Hima Lakkaraju

Viaarxiv icon

Counterfactual Explanation Policies in RL

Add code
Bookmark button
Alert button
Jul 25, 2023
Shripad V. Deshmukh, Srivatsan R, Supriti Vijay, Jayakumar Subramanian, Chirag Agarwal

Viaarxiv icon

Explaining RL Decisions with Trajectories

Add code
Bookmark button
Alert button
May 06, 2023
Shripad Vilasrao Deshmukh, Arpan Dasgupta, Balaji Krishnamurthy, Nan Jiang, Chirag Agarwal, Georgios Theocharous, Jayakumar Subramanian

Figure 1 for Explaining RL Decisions with Trajectories
Figure 2 for Explaining RL Decisions with Trajectories
Figure 3 for Explaining RL Decisions with Trajectories
Figure 4 for Explaining RL Decisions with Trajectories
Viaarxiv icon

Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation

Add code
Bookmark button
Alert button
Apr 25, 2023
Michael Llordes, Debasis Ganguly, Sumit Bhatia, Chirag Agarwal

Figure 1 for Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation
Figure 2 for Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation
Figure 3 for Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation
Figure 4 for Explain like I am BM25: Interpreting a Dense Model's Ranked-List with a Sparse Approximation
Viaarxiv icon