Picture for Tatsunori Hashimoto

Tatsunori Hashimoto

Language Models with Conformal Factuality Guarantees

Add code
Feb 15, 2024
Viaarxiv icon

Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution

Add code
Jan 29, 2024
Figure 1 for Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution
Figure 2 for Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution
Figure 3 for Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution
Figure 4 for Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution
Viaarxiv icon

On the Learnability of Watermarks for Language Models

Add code
Dec 07, 2023
Viaarxiv icon

Removing RLHF Protections in GPT-4 via Fine-Tuning

Add code
Nov 10, 2023
Viaarxiv icon

MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks

Add code
Oct 31, 2023
Figure 1 for MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
Figure 2 for MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
Figure 3 for MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
Figure 4 for MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
Viaarxiv icon

On the Fairness ROAD: Robust Optimization for Adversarial Debiasing

Add code
Oct 27, 2023
Viaarxiv icon

Learning to (Learn at Test Time)

Add code
Oct 20, 2023
Figure 1 for Learning to (Learn at Test Time)
Figure 2 for Learning to (Learn at Test Time)
Figure 3 for Learning to (Learn at Test Time)
Figure 4 for Learning to (Learn at Test Time)
Viaarxiv icon

Benchmarking and Improving Generator-Validator Consistency of Language Models

Add code
Oct 03, 2023
Viaarxiv icon

Identifying the Risks of LM Agents with an LM-Emulated Sandbox

Add code
Sep 25, 2023
Figure 1 for Identifying the Risks of LM Agents with an LM-Emulated Sandbox
Figure 2 for Identifying the Risks of LM Agents with an LM-Emulated Sandbox
Figure 3 for Identifying the Risks of LM Agents with an LM-Emulated Sandbox
Figure 4 for Identifying the Risks of LM Agents with an LM-Emulated Sandbox
Viaarxiv icon

Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions

Add code
Sep 25, 2023
Viaarxiv icon