Picture for Fernando Diaz

Fernando Diaz

Quantifying the Statistical Effect of Rubric Modifications on Human-Autorater Agreement

Add code
May 07, 2026
Viaarxiv icon

Multilingual and Domain-Agnostic Tip-of-the-Tongue Query Generation for Simulated Evaluation

Add code
Apr 22, 2026
Viaarxiv icon

Evaluation of Agents under Simulated AI Marketplace Dynamics

Add code
Apr 15, 2026
Viaarxiv icon

Overview of the TREC 2025 Tip-of-the-Tongue track

Add code
Jan 28, 2026
Viaarxiv icon

Diversification as Risk Minimization

Add code
Oct 26, 2025
Viaarxiv icon

Rigor in AI: Doing Rigorous AI Work Requires a Broader, Responsible AI-Informed Conception of Rigor

Add code
Jun 17, 2025
Viaarxiv icon

LTRR: Learning To Rank Retrievers for LLMs

Add code
Jun 16, 2025
Figure 1 for LTRR: Learning To Rank Retrievers for LLMs
Figure 2 for LTRR: Learning To Rank Retrievers for LLMs
Figure 3 for LTRR: Learning To Rank Retrievers for LLMs
Figure 4 for LTRR: Learning To Rank Retrievers for LLMs
Viaarxiv icon

Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy

Add code
Mar 25, 2025
Figure 1 for Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy
Figure 2 for Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy
Figure 3 for Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy
Figure 4 for Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy
Viaarxiv icon

Tip of the Tongue Query Elicitation for Simulated Evaluation

Add code
Feb 25, 2025
Figure 1 for Tip of the Tongue Query Elicitation for Simulated Evaluation
Figure 2 for Tip of the Tongue Query Elicitation for Simulated Evaluation
Figure 3 for Tip of the Tongue Query Elicitation for Simulated Evaluation
Figure 4 for Tip of the Tongue Query Elicitation for Simulated Evaluation
Viaarxiv icon

Offline Evaluation of Set-Based Text-to-Image Generation

Add code
Oct 22, 2024
Viaarxiv icon