Picture for Jimmy Lin

Jimmy Lin

Don't "Overthink" Passage Reranking: Is Reasoning Truly Necessary?

Add code
May 22, 2025
Viaarxiv icon

Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards

Add code
May 07, 2025
Figure 1 for Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
Figure 2 for Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
Figure 3 for Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
Figure 4 for Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
Viaarxiv icon

Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality

Add code
May 05, 2025
Figure 1 for Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality
Figure 2 for Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality
Figure 3 for Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality
Figure 4 for Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality
Viaarxiv icon

Chatbot Arena Meets Nuggets: Towards Explanations and Diagnostics in the Evaluation of LLM Responses

Add code
Apr 28, 2025
Viaarxiv icon

The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models

Add code
Apr 21, 2025
Figure 1 for The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Figure 2 for The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Figure 3 for The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Figure 4 for The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Viaarxiv icon

Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges

Add code
Apr 21, 2025
Viaarxiv icon

FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

Add code
Apr 17, 2025
Figure 1 for FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents
Figure 2 for FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents
Figure 3 for FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents
Figure 4 for FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents
Viaarxiv icon

Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB

Add code
Apr 01, 2025
Viaarxiv icon

Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning

Add code
Mar 08, 2025
Figure 1 for Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning
Figure 2 for Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning
Figure 3 for Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning
Figure 4 for Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning
Viaarxiv icon

Teaching Dense Retrieval Models to Specialize with Listwise Distillation and LLM Data Augmentation

Add code
Feb 27, 2025
Figure 1 for Teaching Dense Retrieval Models to Specialize with Listwise Distillation and LLM Data Augmentation
Figure 2 for Teaching Dense Retrieval Models to Specialize with Listwise Distillation and LLM Data Augmentation
Figure 3 for Teaching Dense Retrieval Models to Specialize with Listwise Distillation and LLM Data Augmentation
Figure 4 for Teaching Dense Retrieval Models to Specialize with Listwise Distillation and LLM Data Augmentation
Viaarxiv icon