Picture for Samuel Albanie

Samuel Albanie

On scalable oversight with weak LLMs judging strong LLMs

Add code
Jul 05, 2024
Viaarxiv icon

HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits

Add code
Jun 05, 2024
Viaarxiv icon

Inverse Constitutional AI: Compressing Preferences into Principles

Add code
Jun 02, 2024
Viaarxiv icon

A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision

Add code
May 16, 2024
Figure 1 for A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision
Figure 2 for A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision
Figure 3 for A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision
Figure 4 for A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision
Viaarxiv icon

SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation

Add code
May 14, 2024
Viaarxiv icon

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Apr 15, 2024
Viaarxiv icon

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

Add code
Apr 08, 2024
Figure 1 for No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Figure 2 for No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Figure 3 for No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Figure 4 for No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Viaarxiv icon

Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress

Add code
Feb 29, 2024
Figure 1 for Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress
Figure 2 for Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress
Figure 3 for Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress
Figure 4 for Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress
Viaarxiv icon

A SOUND APPROACH: Using Large Language Models to generate audio descriptions for egocentric text-audio retrieval

Add code
Feb 29, 2024
Viaarxiv icon

InstructVideo: Instructing Video Diffusion Models with Human Feedback

Add code
Dec 19, 2023
Viaarxiv icon