Picture for Samuel Albanie

Samuel Albanie

Michael Pokorny

Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?

Add code
Nov 07, 2024
Figure 1 for Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
Figure 2 for Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
Figure 3 for Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
Figure 4 for Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
Viaarxiv icon

A Practitioner's Guide to Continual Multimodal Pretraining

Add code
Aug 26, 2024
Viaarxiv icon

GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models

Add code
Aug 21, 2024
Figure 1 for GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
Figure 2 for GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
Figure 3 for GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
Figure 4 for GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
Viaarxiv icon

On scalable oversight with weak LLMs judging strong LLMs

Add code
Jul 05, 2024
Figure 1 for On scalable oversight with weak LLMs judging strong LLMs
Figure 2 for On scalable oversight with weak LLMs judging strong LLMs
Figure 3 for On scalable oversight with weak LLMs judging strong LLMs
Figure 4 for On scalable oversight with weak LLMs judging strong LLMs
Viaarxiv icon

HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits

Add code
Jun 05, 2024
Figure 1 for HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Figure 2 for HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Figure 3 for HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Figure 4 for HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Viaarxiv icon

Inverse Constitutional AI: Compressing Preferences into Principles

Add code
Jun 02, 2024
Figure 1 for Inverse Constitutional AI: Compressing Preferences into Principles
Figure 2 for Inverse Constitutional AI: Compressing Preferences into Principles
Figure 3 for Inverse Constitutional AI: Compressing Preferences into Principles
Figure 4 for Inverse Constitutional AI: Compressing Preferences into Principles
Viaarxiv icon

A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision

Add code
May 16, 2024
Figure 1 for A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision
Figure 2 for A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision
Figure 3 for A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision
Figure 4 for A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision
Viaarxiv icon

SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation

Add code
May 14, 2024
Figure 1 for SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation
Figure 2 for SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation
Figure 3 for SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation
Figure 4 for SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation
Viaarxiv icon

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Apr 15, 2024
Figure 1 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 2 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 3 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 4 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Viaarxiv icon

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

Add code
Apr 08, 2024
Figure 1 for No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Figure 2 for No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Figure 3 for No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Figure 4 for No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Viaarxiv icon