Picture for Percy Liang

Percy Liang

Shammie

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models

Add code
Aug 15, 2024
Viaarxiv icon

The Foundation Model Transparency Index v1.1: May 2024

Add code
Jul 17, 2024
Viaarxiv icon

AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models

Add code
Jul 11, 2024
Figure 1 for AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models
Figure 2 for AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models
Figure 3 for AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models
Figure 4 for AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models
Viaarxiv icon

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources

Add code
Jun 26, 2024
Figure 1 for The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources
Figure 2 for The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources
Figure 3 for The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources
Viaarxiv icon

AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies

Add code
Jun 25, 2024
Figure 1 for AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies
Figure 2 for AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies
Figure 3 for AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies
Figure 4 for AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies
Viaarxiv icon

OpenVLA: An Open-Source Vision-Language-Action Model

Add code
Jun 13, 2024
Figure 1 for OpenVLA: An Open-Source Vision-Language-Action Model
Figure 2 for OpenVLA: An Open-Source Vision-Language-Action Model
Figure 3 for OpenVLA: An Open-Source Vision-Language-Action Model
Figure 4 for OpenVLA: An Open-Source Vision-Language-Action Model
Viaarxiv icon

BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments

Add code
May 27, 2024
Figure 1 for BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments
Figure 2 for BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments
Figure 3 for BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments
Figure 4 for BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments
Viaarxiv icon

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Add code
Apr 18, 2024
Figure 1 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Figure 2 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Figure 3 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Figure 4 for Introducing v0.5 of the AI Safety Benchmark from MLCommons
Viaarxiv icon

Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators

Add code
Apr 06, 2024
Figure 1 for Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Figure 2 for Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Figure 3 for Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Figure 4 for Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Viaarxiv icon

FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning

Add code
Apr 02, 2024
Viaarxiv icon