Picture for He He

He He

Spontaneous Reward Hacking in Iterative Self-Refinement

Add code
Jul 05, 2024
Viaarxiv icon

LLMs Are Prone to Fallacies in Causal Inference

Add code
Jun 18, 2024
Viaarxiv icon

Iterative Reasoning Preference Optimization

Add code
Apr 30, 2024
Figure 1 for Iterative Reasoning Preference Optimization
Figure 2 for Iterative Reasoning Preference Optimization
Figure 3 for Iterative Reasoning Preference Optimization
Figure 4 for Iterative Reasoning Preference Optimization
Viaarxiv icon

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

Add code
Apr 24, 2024
Viaarxiv icon

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Apr 15, 2024
Viaarxiv icon

Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World

Add code
Mar 30, 2024
Viaarxiv icon

Parallel Structures in Pre-training Data Yield In-Context Learning

Add code
Feb 19, 2024
Viaarxiv icon

Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning

Add code
Jan 25, 2024
Viaarxiv icon

Pragmatic Radiology Report Generation

Add code
Nov 28, 2023
Viaarxiv icon

Show Your Work with Confidence: Confidence Bands for Tuning Curves

Add code
Nov 16, 2023
Viaarxiv icon