Alert button
Picture for Oam Patel

Oam Patel

Alert button

Defending Against Unforeseen Failure Modes with Latent Adversarial Training

Add code
Bookmark button
Alert button
Mar 08, 2024
Stephen Casper, Lennart Schulze, Oam Patel, Dylan Hadfield-Menell

Figure 1 for Defending Against Unforeseen Failure Modes with Latent Adversarial Training
Figure 2 for Defending Against Unforeseen Failure Modes with Latent Adversarial Training
Figure 3 for Defending Against Unforeseen Failure Modes with Latent Adversarial Training
Figure 4 for Defending Against Unforeseen Failure Modes with Latent Adversarial Training
Viaarxiv icon

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

Add code
Bookmark button
Alert button
Jun 07, 2023
Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg

Figure 1 for Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Figure 2 for Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Figure 3 for Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Figure 4 for Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Viaarxiv icon