Picture for Nandan Thakur

Nandan Thakur

ORBIT: Scalable and Verifiable Data Generation for Search Agents on a Tight Budget

Add code
Apr 02, 2026
Viaarxiv icon

Overview of the TREC 2025 Retrieval Augmented Generation (RAG) Track

Add code
Mar 10, 2026
Viaarxiv icon

Still Fresh? Evaluating Temporal Drift in Retrieval Benchmarks

Add code
Mar 04, 2026
Viaarxiv icon

Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval

Add code
May 22, 2025
Figure 1 for Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval
Figure 2 for Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval
Figure 3 for Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval
Figure 4 for Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval
Viaarxiv icon

Chatbot Arena Meets Nuggets: Towards Explanations and Diagnostics in the Evaluation of LLM Responses

Add code
Apr 28, 2025
Viaarxiv icon

The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models

Add code
Apr 21, 2025
Figure 1 for The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Figure 2 for The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Figure 3 for The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Figure 4 for The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Viaarxiv icon

Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges

Add code
Apr 21, 2025
Viaarxiv icon

FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

Add code
Apr 17, 2025
Figure 1 for FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents
Figure 2 for FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents
Figure 3 for FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents
Figure 4 for FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents
Viaarxiv icon

MMTEB: Massive Multilingual Text Embedding Benchmark

Add code
Feb 19, 2025
Viaarxiv icon

Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework

Add code
Nov 14, 2024
Figure 1 for Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework
Figure 2 for Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework
Figure 3 for Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework
Figure 4 for Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework
Viaarxiv icon