Picture for Valentina Pyatkin

Valentina Pyatkin

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Add code
Nov 22, 2024
Viaarxiv icon

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

Add code
Oct 24, 2024
Figure 1 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Figure 2 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Figure 3 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Figure 4 for Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Viaarxiv icon

SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation

Add code
Oct 22, 2024
Figure 1 for SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation
Figure 2 for SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation
Figure 3 for SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation
Figure 4 for SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation
Viaarxiv icon

Diverging Preferences: When do Annotators Disagree and do Models Know?

Add code
Oct 18, 2024
Viaarxiv icon

Explicating the Implicit: Argument Detection Beyond Sentence Boundaries

Add code
Aug 08, 2024
Viaarxiv icon

Self-Directed Synthetic Dialogues and Revisions Technical Report

Add code
Jul 25, 2024
Viaarxiv icon

Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

Add code
Jun 13, 2024
Viaarxiv icon

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Add code
Jun 07, 2024
Figure 1 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 2 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 3 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 4 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Viaarxiv icon

Superlatives in Context: Explicit and Implicit Domain Restrictions for Superlative Frames

Add code
May 31, 2024
Viaarxiv icon

RewardBench: Evaluating Reward Models for Language Modeling

Add code
Mar 20, 2024
Viaarxiv icon