Picture for Keegan Hines

Keegan Hines

Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers

Add code
Oct 16, 2025
Viaarxiv icon

Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

Add code
Oct 02, 2025
Figure 1 for Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
Figure 2 for Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
Figure 3 for Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
Figure 4 for Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
Viaarxiv icon

OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Modalities

Add code
May 29, 2025
Figure 1 for OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Modalities
Figure 2 for OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Modalities
Figure 3 for OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Modalities
Figure 4 for OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Modalities
Viaarxiv icon

Lessons From Red Teaming 100 Generative AI Products

Add code
Jan 13, 2025
Viaarxiv icon

Defending Against Indirect Prompt Injection Attacks With Spotlighting

Add code
Mar 20, 2024
Viaarxiv icon

Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models

Add code
Dec 21, 2023
Viaarxiv icon

Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective

Add code
Mar 23, 2023
Figure 1 for Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective
Figure 2 for Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective
Figure 3 for Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective
Figure 4 for Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective
Viaarxiv icon

Achieving Downstream Fairness with Geometric Repair

Add code
Mar 14, 2022
Figure 1 for Achieving Downstream Fairness with Geometric Repair
Figure 2 for Achieving Downstream Fairness with Geometric Repair
Figure 3 for Achieving Downstream Fairness with Geometric Repair
Figure 4 for Achieving Downstream Fairness with Geometric Repair
Viaarxiv icon

Counterfactual Explanations for Machine Learning: Challenges Revisited

Add code
Jun 14, 2021
Figure 1 for Counterfactual Explanations for Machine Learning: Challenges Revisited
Viaarxiv icon

Amortized Generation of Sequential Counterfactual Explanations for Black-box Models

Add code
Jun 07, 2021
Figure 1 for Amortized Generation of Sequential Counterfactual Explanations for Black-box Models
Figure 2 for Amortized Generation of Sequential Counterfactual Explanations for Black-box Models
Figure 3 for Amortized Generation of Sequential Counterfactual Explanations for Black-box Models
Figure 4 for Amortized Generation of Sequential Counterfactual Explanations for Black-box Models
Viaarxiv icon