Picture for Ori Yoran

Ori Yoran

The BrowserGym Ecosystem for Web Agent Research

Add code
Dec 10, 2024
Viaarxiv icon

AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

Add code
Jul 22, 2024
Viaarxiv icon

From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty

Add code
Jul 08, 2024
Viaarxiv icon

Making Retrieval-Augmented Language Models Robust to Irrelevant Context

Add code
Oct 02, 2023
Viaarxiv icon

Evaluating the Ripple Effects of Knowledge Editing in Language Models

Add code
Jul 24, 2023
Viaarxiv icon

Answering Questions by Meta-Reasoning over Multiple Chains of Thought

Add code
Apr 25, 2023
Viaarxiv icon

QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs

Add code
May 26, 2022
Figure 1 for QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs
Figure 2 for QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs
Figure 3 for QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs
Figure 4 for QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs
Viaarxiv icon

CommonsenseQA 2.0: Exposing the Limits of AI through Gamification

Add code
Jan 14, 2022
Figure 1 for CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
Figure 2 for CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
Figure 3 for CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
Figure 4 for CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
Viaarxiv icon

SCROLLS: Standardized CompaRison Over Long Language Sequences

Add code
Jan 10, 2022
Figure 1 for SCROLLS: Standardized CompaRison Over Long Language Sequences
Figure 2 for SCROLLS: Standardized CompaRison Over Long Language Sequences
Figure 3 for SCROLLS: Standardized CompaRison Over Long Language Sequences
Figure 4 for SCROLLS: Standardized CompaRison Over Long Language Sequences
Viaarxiv icon

Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills

Add code
Jul 15, 2021
Figure 1 for Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills
Figure 2 for Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills
Figure 3 for Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills
Figure 4 for Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills
Viaarxiv icon