Picture for Percy Liang

Percy Liang

Shammie

The Mighty ToRR: A Benchmark for Table Reasoning and Robustness

Add code
Feb 26, 2025
Figure 1 for The Mighty ToRR: A Benchmark for Table Reasoning and Robustness
Figure 2 for The Mighty ToRR: A Benchmark for Table Reasoning and Robustness
Figure 3 for The Mighty ToRR: A Benchmark for Table Reasoning and Robustness
Figure 4 for The Mighty ToRR: A Benchmark for Table Reasoning and Robustness
Viaarxiv icon

Independence Tests for Language Models

Add code
Feb 17, 2025
Viaarxiv icon

Auditing Prompt Caching in Language Model APIs

Add code
Feb 11, 2025
Figure 1 for Auditing Prompt Caching in Language Model APIs
Figure 2 for Auditing Prompt Caching in Language Model APIs
Figure 3 for Auditing Prompt Caching in Language Model APIs
Figure 4 for Auditing Prompt Caching in Language Model APIs
Viaarxiv icon

Language Models Prefer What They Know: Relative Confidence Estimation via Confidence Preferences

Add code
Feb 03, 2025
Figure 1 for Language Models Prefer What They Know: Relative Confidence Estimation via Confidence Preferences
Figure 2 for Language Models Prefer What They Know: Relative Confidence Estimation via Confidence Preferences
Figure 3 for Language Models Prefer What They Know: Relative Confidence Estimation via Confidence Preferences
Figure 4 for Language Models Prefer What They Know: Relative Confidence Estimation via Confidence Preferences
Viaarxiv icon

Eliciting Language Model Behaviors with Investigator Agents

Add code
Feb 03, 2025
Viaarxiv icon

s1: Simple test-time scaling

Add code
Jan 31, 2025
Figure 1 for s1: Simple test-time scaling
Figure 2 for s1: Simple test-time scaling
Figure 3 for s1: Simple test-time scaling
Figure 4 for s1: Simple test-time scaling
Viaarxiv icon

International AI Safety Report

Add code
Jan 29, 2025
Figure 1 for International AI Safety Report
Figure 2 for International AI Safety Report
Figure 3 for International AI Safety Report
Figure 4 for International AI Safety Report
Viaarxiv icon

Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice

Add code
Dec 09, 2024
Figure 1 for Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
Figure 2 for Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
Figure 3 for Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
Figure 4 for Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
Viaarxiv icon

Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback

Add code
Dec 03, 2024
Figure 1 for Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback
Figure 2 for Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback
Figure 3 for Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback
Figure 4 for Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback
Viaarxiv icon

RedPajama: an Open Dataset for Training Large Language Models

Add code
Nov 19, 2024
Viaarxiv icon