Picture for Jonathan Herzig

Jonathan Herzig

A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains

Add code
Feb 02, 2024
Figure 1 for A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains
Figure 2 for A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains
Figure 3 for A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains
Figure 4 for A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains
Viaarxiv icon

Multilingual Instruction Tuning With Just a Pinch of Multilinguality

Add code
Jan 08, 2024
Viaarxiv icon

A Comprehensive Evaluation of Tool-Assisted Generation Strategies

Add code
Oct 16, 2023
Figure 1 for A Comprehensive Evaluation of Tool-Assisted Generation Strategies
Figure 2 for A Comprehensive Evaluation of Tool-Assisted Generation Strategies
Figure 3 for A Comprehensive Evaluation of Tool-Assisted Generation Strategies
Figure 4 for A Comprehensive Evaluation of Tool-Assisted Generation Strategies
Viaarxiv icon

Evaluating and Modeling Attribution for Cross-Lingual Question Answering

Add code
May 23, 2023
Figure 1 for Evaluating and Modeling Attribution for Cross-Lingual Question Answering
Figure 2 for Evaluating and Modeling Attribution for Cross-Lingual Question Answering
Figure 3 for Evaluating and Modeling Attribution for Cross-Lingual Question Answering
Figure 4 for Evaluating and Modeling Attribution for Cross-Lingual Question Answering
Viaarxiv icon

What You See is What You Read? Improving Text-Image Alignment Evaluation

Add code
May 22, 2023
Viaarxiv icon

TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models

Add code
May 18, 2023
Viaarxiv icon

mFACE: Multilingual Summarization with Factual Consistency Evaluation

Add code
Dec 20, 2022
Figure 1 for mFACE: Multilingual Summarization with Factual Consistency Evaluation
Figure 2 for mFACE: Multilingual Summarization with Factual Consistency Evaluation
Figure 3 for mFACE: Multilingual Summarization with Factual Consistency Evaluation
Figure 4 for mFACE: Multilingual Summarization with Factual Consistency Evaluation
Viaarxiv icon

Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models

Add code
Dec 15, 2022
Figure 1 for Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models
Figure 2 for Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models
Figure 3 for Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models
Figure 4 for Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models
Viaarxiv icon

QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs

Add code
May 26, 2022
Figure 1 for QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs
Figure 2 for QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs
Figure 3 for QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs
Figure 4 for QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs
Viaarxiv icon

Evaluating the Impact of Model Scale for Compositional Generalization in Semantic Parsing

Add code
May 24, 2022
Figure 1 for Evaluating the Impact of Model Scale for Compositional Generalization in Semantic Parsing
Figure 2 for Evaluating the Impact of Model Scale for Compositional Generalization in Semantic Parsing
Figure 3 for Evaluating the Impact of Model Scale for Compositional Generalization in Semantic Parsing
Figure 4 for Evaluating the Impact of Model Scale for Compositional Generalization in Semantic Parsing
Viaarxiv icon