Picture for Haidar Khan

Haidar Khan

CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark

Add code
Oct 30, 2025
Figure 1 for CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark
Figure 2 for CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark
Figure 3 for CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark
Figure 4 for CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark
Viaarxiv icon

Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage

Add code
Oct 02, 2025
Figure 1 for Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage
Figure 2 for Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage
Figure 3 for Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage
Figure 4 for Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage
Viaarxiv icon

ConfQA: Answer Only If You Are Confident

Add code
Jun 08, 2025
Figure 1 for ConfQA: Answer Only If You Are Confident
Figure 2 for ConfQA: Answer Only If You Are Confident
Figure 3 for ConfQA: Answer Only If You Are Confident
Figure 4 for ConfQA: Answer Only If You Are Confident
Viaarxiv icon

ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition

Add code
Apr 17, 2025
Figure 1 for ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition
Figure 2 for ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition
Figure 3 for ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition
Figure 4 for ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition
Viaarxiv icon

ALLaM: Large Language Models for Arabic and English

Add code
Jul 22, 2024
Figure 1 for ALLaM: Large Language Models for Arabic and English
Figure 2 for ALLaM: Large Language Models for Arabic and English
Figure 3 for ALLaM: Large Language Models for Arabic and English
Figure 4 for ALLaM: Large Language Models for Arabic and English
Viaarxiv icon

A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations

Add code
Jul 04, 2024
Figure 1 for A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Figure 2 for A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Figure 3 for A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Figure 4 for A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Viaarxiv icon

When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards

Add code
Feb 01, 2024
Figure 1 for When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards
Figure 2 for When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards
Figure 3 for When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards
Figure 4 for When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards
Viaarxiv icon

Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning

Add code
May 19, 2023
Figure 1 for Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning
Figure 2 for Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning
Figure 3 for Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning
Figure 4 for Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning
Viaarxiv icon

Low-Resource Compositional Semantic Parsing with Concept Pretraining

Add code
Jan 30, 2023
Viaarxiv icon

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

Add code
Aug 03, 2022
Figure 1 for AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Figure 2 for AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Figure 3 for AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Figure 4 for AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Viaarxiv icon