Picture for Andrew M. Bean

Andrew M. Bean

Every Eval Ever: A Unifying Schema and Community Repository for AI Evaluation Results

Add code
Jun 12, 2026
Viaarxiv icon

To Whom Do Language Models Align? Measuring Principal Hierarchies Under High-Stakes Competing Demands

Add code
May 12, 2026
Viaarxiv icon

Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings

Add code
Oct 30, 2025
Figure 1 for Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings
Figure 2 for Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings
Figure 3 for Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings
Figure 4 for Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings
Viaarxiv icon

LLMs Don't Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations

Add code
Sep 11, 2025
Viaarxiv icon

Clinical knowledge in LLMs does not translate to human interactions

Add code
Apr 26, 2025
Figure 1 for Clinical knowledge in LLMs does not translate to human interactions
Figure 2 for Clinical knowledge in LLMs does not translate to human interactions
Figure 3 for Clinical knowledge in LLMs does not translate to human interactions
Figure 4 for Clinical knowledge in LLMs does not translate to human interactions
Viaarxiv icon

Evaluating the role of `Constitutions' for learning from AI feedback

Add code
Nov 15, 2024
Viaarxiv icon

Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering

Add code
Aug 15, 2024
Figure 1 for Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering
Figure 2 for Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering
Figure 3 for Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering
Figure 4 for Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering
Viaarxiv icon

LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages

Add code
Jun 11, 2024
Figure 1 for LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages
Figure 2 for LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages
Figure 3 for LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages
Figure 4 for LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages
Viaarxiv icon

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values

Add code
Oct 11, 2023
Figure 1 for The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
Figure 2 for The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
Figure 3 for The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
Figure 4 for The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
Viaarxiv icon

Exploring the Landscape of Large Language Models In Medical Question Answering: Observations and Open Questions

Add code
Oct 11, 2023
Viaarxiv icon