Picture for Kyle Richardson

Kyle Richardson

Shammie

SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories

Add code
Sep 11, 2024
Viaarxiv icon

SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals

Add code
Jun 07, 2024
Viaarxiv icon

TimeArena: Shaping Efficient Multitasking Language Agents in a Time-Aware Simulation

Add code
Feb 08, 2024
Viaarxiv icon

OLMo: Accelerating the Science of Language Models

Add code
Feb 07, 2024
Figure 1 for OLMo: Accelerating the Science of Language Models
Figure 2 for OLMo: Accelerating the Science of Language Models
Figure 3 for OLMo: Accelerating the Science of Language Models
Figure 4 for OLMo: Accelerating the Science of Language Models
Viaarxiv icon

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Add code
Jan 31, 2024
Viaarxiv icon

Paloma: A Benchmark for Evaluating Language Model Fit

Add code
Dec 16, 2023
Viaarxiv icon

Catwalk: A Unified Language Model Evaluation Framework for Many Datasets

Add code
Dec 15, 2023
Viaarxiv icon

Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena

Add code
Oct 09, 2023
Viaarxiv icon

Language Models with Rationality

Add code
May 23, 2023
Viaarxiv icon

DISCO: Distilling Phrasal Counterfactuals with Large Language Models

Add code
Dec 20, 2022
Viaarxiv icon