Picture for Darshan Deshpande

Darshan Deshpande

DETOUR: An Interactive Benchmark for Dual-Agent Search and Reasoning

Add code
Jan 30, 2026
Viaarxiv icon

Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis

Add code
Jan 27, 2026
Viaarxiv icon

MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments

Add code
Oct 01, 2025
Figure 1 for MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments
Figure 2 for MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments
Figure 3 for MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments
Figure 4 for MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments
Viaarxiv icon

TRAIL: Trace Reasoning and Agentic Issue Localization

Add code
May 13, 2025
Figure 1 for TRAIL: Trace Reasoning and Agentic Issue Localization
Figure 2 for TRAIL: Trace Reasoning and Agentic Issue Localization
Figure 3 for TRAIL: Trace Reasoning and Agentic Issue Localization
Figure 4 for TRAIL: Trace Reasoning and Agentic Issue Localization
Viaarxiv icon

Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning

Add code
Mar 24, 2025
Figure 1 for Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning
Figure 2 for Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning
Figure 3 for Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning
Figure 4 for Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning
Viaarxiv icon

GLIDER: Grading LLM Interactions and Decisions using Explainable Ranking

Add code
Dec 18, 2024
Figure 1 for GLIDER: Grading LLM Interactions and Decisions using Explainable Ranking
Figure 2 for GLIDER: Grading LLM Interactions and Decisions using Explainable Ranking
Figure 3 for GLIDER: Grading LLM Interactions and Decisions using Explainable Ranking
Figure 4 for GLIDER: Grading LLM Interactions and Decisions using Explainable Ranking
Viaarxiv icon

GNOME: Generating Negotiations through Open-Domain Mapping of Exchanges

Add code
Jun 16, 2024
Figure 1 for GNOME: Generating Negotiations through Open-Domain Mapping of Exchanges
Figure 2 for GNOME: Generating Negotiations through Open-Domain Mapping of Exchanges
Figure 3 for GNOME: Generating Negotiations through Open-Domain Mapping of Exchanges
Figure 4 for GNOME: Generating Negotiations through Open-Domain Mapping of Exchanges
Viaarxiv icon

Robust Text Classification: Analyzing Prototype-Based Networks

Add code
Nov 11, 2023
Viaarxiv icon

Contextualizing Argument Quality Assessment with Relevant Knowledge

Add code
May 20, 2023
Figure 1 for Contextualizing Argument Quality Assessment with Relevant Knowledge
Figure 2 for Contextualizing Argument Quality Assessment with Relevant Knowledge
Figure 3 for Contextualizing Argument Quality Assessment with Relevant Knowledge
Figure 4 for Contextualizing Argument Quality Assessment with Relevant Knowledge
Viaarxiv icon

Robust and Explainable Identification of Logical Fallacies in Natural Language Arguments

Add code
Dec 12, 2022
Figure 1 for Robust and Explainable Identification of Logical Fallacies in Natural Language Arguments
Figure 2 for Robust and Explainable Identification of Logical Fallacies in Natural Language Arguments
Figure 3 for Robust and Explainable Identification of Logical Fallacies in Natural Language Arguments
Figure 4 for Robust and Explainable Identification of Logical Fallacies in Natural Language Arguments
Viaarxiv icon