Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Manon Revel

Arbiters of Ambivalence: Challenges of Using LLMs in No-Consensus Tasks

May 28, 2025

Bhaktipriya Radharapu, Manon Revel, Megan Ung, Sebastian Ruder, Adina Williams

Abstract:The increasing use of LLMs as substitutes for humans in ``aligning'' LLMs has raised questions about their ability to replicate human judgments and preferences, especially in ambivalent scenarios where humans disagree. This study examines the biases and limitations of LLMs in three roles: answer generator, judge, and debater. These roles loosely correspond to previously described alignment frameworks: preference alignment (judge) and scalable oversight (debater), with the answer generator reflecting the typical setting with user interactions. We develop a ``no-consensus'' benchmark by curating examples that encompass a variety of a priori ambivalent scenarios, each presenting two possible stances. Our results show that while LLMs can provide nuanced assessments when generating open-ended answers, they tend to take a stance on no-consensus topics when employed as judges or debaters. These findings underscore the necessity for more sophisticated methods for aligning LLMs without human oversight, highlighting that LLMs cannot fully capture human disagreement even on topics where humans themselves are divided.

Via

Access Paper or Ask Questions

SEAL: Systematic Error Analysis for Value ALignment

Aug 16, 2024

Manon Revel, Matteo Cargnelutti, Tyna Eloundou, Greg Leppert

Abstract:Reinforcement Learning from Human Feedback (RLHF) aims to align language models (LMs) with human values by training reward models (RMs) on binary preferences and using these RMs to fine-tune the base LMs. Despite its importance, the internal mechanisms of RLHF remain poorly understood. This paper introduces new metrics to evaluate the effectiveness of modeling and aligning human values, namely feature imprint, alignment resistance and alignment robustness. We categorize alignment datasets into target features (desired values) and spoiler features (undesired concepts). By regressing RM scores against these features, we quantify the extent to which RMs reward them - a metric we term feature imprint. We define alignment resistance as the proportion of the preference dataset where RMs fail to match human preferences, and we assess alignment robustness by analyzing RM responses to perturbed inputs. Our experiments, utilizing open-source components like the Anthropic/hh-rlhf preference dataset and OpenAssistant RMs, reveal significant imprints of target features and a notable sensitivity to spoiler features. We observed a 26% incidence of alignment resistance in portions of the dataset where LM-labelers disagreed with human preferences. Furthermore, we find that misalignment often arises from ambiguous entries within the alignment dataset. These findings underscore the importance of scrutinizing both RMs and alignment datasets for a deeper understanding of value alignment.

* 28 pages, 17 Figures, 8 Tables

Via

Access Paper or Ask Questions

The Optimal Size of an Epistemic Congress

Jul 02, 2021

Manon Revel, Tao Lin, Daniel Halpern

Figure 1 for The Optimal Size of an Epistemic Congress

Figure 2 for The Optimal Size of an Epistemic Congress

Figure 3 for The Optimal Size of an Epistemic Congress

Figure 4 for The Optimal Size of an Epistemic Congress

Abstract:We analyze the optimal size of a congress in a representative democracy. We take an epistemic view where voters decide on a binary issue with one ground truth outcome, and each voter votes correctly according to their competence levels in $[0, 1]$. Assuming that we can sample the best experts to form an epistemic congress, we find that the optimal congress size should be linear in the population size. This result is striking because it holds even when allowing the top representatives to be accurate with arbitrarily high probabilities. We then analyze real world data, finding that the actual sizes of congresses are much smaller than the optimal size our theoretical results suggest. We conclude by analyzing under what conditions congresses of sub-optimal sizes would still outperform direct democracy, in which all voters vote.

Via

Access Paper or Ask Questions