Picture for Negar Arabzadeh

Negar Arabzadeh

Benchmarking LLM-based Relevance Judgment Methods

Add code
Apr 17, 2025
Viaarxiv icon

A Human-AI Comparative Analysis of Prompt Sensitivity in LLM-Based Relevance Judgment

Add code
Apr 16, 2025
Figure 1 for A Human-AI Comparative Analysis of Prompt Sensitivity in LLM-Based Relevance Judgment
Figure 2 for A Human-AI Comparative Analysis of Prompt Sensitivity in LLM-Based Relevance Judgment
Figure 3 for A Human-AI Comparative Analysis of Prompt Sensitivity in LLM-Based Relevance Judgment
Figure 4 for A Human-AI Comparative Analysis of Prompt Sensitivity in LLM-Based Relevance Judgment
Viaarxiv icon

exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment Problem

Add code
Feb 11, 2025
Figure 1 for exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment Problem
Figure 2 for exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment Problem
Figure 3 for exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment Problem
Figure 4 for exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment Problem
Viaarxiv icon

Benchmarking Prompt Sensitivity in Large Language Models

Add code
Feb 09, 2025
Figure 1 for Benchmarking Prompt Sensitivity in Large Language Models
Figure 2 for Benchmarking Prompt Sensitivity in Large Language Models
Figure 3 for Benchmarking Prompt Sensitivity in Large Language Models
Figure 4 for Benchmarking Prompt Sensitivity in Large Language Models
Viaarxiv icon

EMPRA: Embedding Perturbation Rank Attack against Neural Ranking Models

Add code
Dec 20, 2024
Figure 1 for EMPRA: Embedding Perturbation Rank Attack against Neural Ranking Models
Figure 2 for EMPRA: Embedding Perturbation Rank Attack against Neural Ranking Models
Figure 3 for EMPRA: Embedding Perturbation Rank Attack against Neural Ranking Models
Figure 4 for EMPRA: Embedding Perturbation Rank Attack against Neural Ranking Models
Viaarxiv icon

Offline Evaluation of Set-Based Text-to-Image Generation

Add code
Oct 22, 2024
Viaarxiv icon

IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents

Add code
Jul 12, 2024
Figure 1 for IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents
Figure 2 for IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents
Figure 3 for IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents
Figure 4 for IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents
Viaarxiv icon

Assessing and Verifying Task Utility in LLM-Powered Applications

Add code
May 03, 2024
Figure 1 for Assessing and Verifying Task Utility in LLM-Powered Applications
Figure 2 for Assessing and Verifying Task Utility in LLM-Powered Applications
Figure 3 for Assessing and Verifying Task Utility in LLM-Powered Applications
Figure 4 for Assessing and Verifying Task Utility in LLM-Powered Applications
Viaarxiv icon

Ranked List Truncation for Large Language Model-based Re-Ranking

Add code
Apr 28, 2024
Viaarxiv icon

Generative Information Retrieval Evaluation

Add code
Apr 11, 2024
Viaarxiv icon