Picture for Graham Neubig

Graham Neubig

Carnegie Mellon University

CodeRAG-Bench: Can Retrieval Augment Code Generation?

Add code
Jun 20, 2024
Figure 1 for CodeRAG-Bench: Can Retrieval Augment Code Generation?
Figure 2 for CodeRAG-Bench: Can Retrieval Augment Code Generation?
Figure 3 for CodeRAG-Bench: Can Retrieval Augment Code Generation?
Figure 4 for CodeRAG-Bench: Can Retrieval Augment Code Generation?
Viaarxiv icon

GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation

Add code
Jun 19, 2024
Figure 1 for GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation
Figure 2 for GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation
Figure 3 for GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation
Figure 4 for GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation
Viaarxiv icon

Language Modeling with Editable External Knowledge

Add code
Jun 17, 2024
Viaarxiv icon

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Add code
Jun 09, 2024
Figure 1 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 2 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 3 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 4 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Viaarxiv icon

MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

Add code
Jun 03, 2024
Figure 1 for MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Figure 2 for MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Figure 3 for MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Figure 4 for MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Viaarxiv icon

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Add code
May 02, 2024
Figure 1 for Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Figure 2 for Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Figure 3 for Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Figure 4 for Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Viaarxiv icon

In-Context Learning with Long-Context Models: An In-Depth Exploration

Add code
Apr 30, 2024
Viaarxiv icon

Better Synthetic Data by Retrieving and Transforming Existing Datasets

Add code
Apr 26, 2024
Figure 1 for Better Synthetic Data by Retrieving and Transforming Existing Datasets
Figure 2 for Better Synthetic Data by Retrieving and Transforming Existing Datasets
Figure 3 for Better Synthetic Data by Retrieving and Transforming Existing Datasets
Figure 4 for Better Synthetic Data by Retrieving and Transforming Existing Datasets
Viaarxiv icon

An Incomplete Loop: Deductive, Inductive, and Abductive Learning in Large Language Models

Add code
Apr 10, 2024
Viaarxiv icon

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

Add code
Apr 09, 2024
Viaarxiv icon