Picture for Aryan Shrivastava

Aryan Shrivastava

Modeling and Predicting Multi-Turn Answer Instability in Large Language Models

Add code
Nov 12, 2025
Figure 1 for Modeling and Predicting Multi-Turn Answer Instability in Large Language Models
Figure 2 for Modeling and Predicting Multi-Turn Answer Instability in Large Language Models
Figure 3 for Modeling and Predicting Multi-Turn Answer Instability in Large Language Models
Figure 4 for Modeling and Predicting Multi-Turn Answer Instability in Large Language Models
Viaarxiv icon

AbsenceBench: Language Models Can't Tell What's Missing

Add code
Jun 13, 2025
Figure 1 for AbsenceBench: Language Models Can't Tell What's Missing
Figure 2 for AbsenceBench: Language Models Can't Tell What's Missing
Figure 3 for AbsenceBench: Language Models Can't Tell What's Missing
Figure 4 for AbsenceBench: Language Models Can't Tell What's Missing
Viaarxiv icon

DICE: A Framework for Dimensional and Contextual Evaluation of Language Models

Add code
Apr 14, 2025
Viaarxiv icon

Moving Beyond Medical Exam Questions: A Clinician-Annotated Dataset of Real-World Tasks and Ambiguity in Mental Healthcare

Add code
Feb 22, 2025
Viaarxiv icon

Measuring Free-Form Decision-Making Inconsistency of Language Models in Military Crisis Simulations

Add code
Oct 17, 2024
Figure 1 for Measuring Free-Form Decision-Making Inconsistency of Language Models in Military Crisis Simulations
Figure 2 for Measuring Free-Form Decision-Making Inconsistency of Language Models in Military Crisis Simulations
Figure 3 for Measuring Free-Form Decision-Making Inconsistency of Language Models in Military Crisis Simulations
Figure 4 for Measuring Free-Form Decision-Making Inconsistency of Language Models in Military Crisis Simulations
Viaarxiv icon