Picture for Diyi Yang

Diyi Yang

Stanford University

Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors

Add code
May 17, 2025
Viaarxiv icon

AutoLibra: Agent Metric Induction from Open-Ended Feedback

Add code
May 05, 2025
Figure 1 for AutoLibra: Agent Metric Induction from Open-Ended Feedback
Figure 2 for AutoLibra: Agent Metric Induction from Open-Ended Feedback
Figure 3 for AutoLibra: Agent Metric Induction from Open-Ended Feedback
Figure 4 for AutoLibra: Agent Metric Induction from Open-Ended Feedback
Viaarxiv icon

SWE-smith: Scaling Data for Software Engineering Agents

Add code
Apr 30, 2025
Viaarxiv icon

Challenges and Paths Towards AI for Software Engineering

Add code
Mar 28, 2025
Viaarxiv icon

EgoNormia: Benchmarking Physical Social Norm Understanding

Add code
Feb 27, 2025
Viaarxiv icon

Mind the Gap! Static and Interactive Evaluations of Large Audio Models

Add code
Feb 21, 2025
Viaarxiv icon

EquiBench: Benchmarking Code Reasoning Capabilities of Large Language Models via Equivalence Checking

Add code
Feb 18, 2025
Figure 1 for EquiBench: Benchmarking Code Reasoning Capabilities of Large Language Models via Equivalence Checking
Figure 2 for EquiBench: Benchmarking Code Reasoning Capabilities of Large Language Models via Equivalence Checking
Figure 3 for EquiBench: Benchmarking Code Reasoning Capabilities of Large Language Models via Equivalence Checking
Figure 4 for EquiBench: Benchmarking Code Reasoning Capabilities of Large Language Models via Equivalence Checking
Viaarxiv icon

No Preference Left Behind: Group Distributional Preference Optimization

Add code
Dec 28, 2024
Figure 1 for No Preference Left Behind: Group Distributional Preference Optimization
Figure 2 for No Preference Left Behind: Group Distributional Preference Optimization
Figure 3 for No Preference Left Behind: Group Distributional Preference Optimization
Figure 4 for No Preference Left Behind: Group Distributional Preference Optimization
Viaarxiv icon

Dynamic Skill Adaptation for Large Language Models

Add code
Dec 26, 2024
Viaarxiv icon

Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration

Add code
Dec 20, 2024
Figure 1 for Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
Figure 2 for Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
Figure 3 for Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
Figure 4 for Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
Viaarxiv icon