Picture for Simerjot Kaur

Simerjot Kaur

WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction

Add code
May 28, 2026
Viaarxiv icon

Grounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Ranking

Add code
May 11, 2026
Viaarxiv icon

Deep FinResearch Bench: Evaluating AI's Ability to Conduct Professional Financial Investment Research

Add code
Apr 22, 2026
Viaarxiv icon

Distill and Align Decomposition for Enhanced Claim Verification

Add code
Feb 25, 2026
Viaarxiv icon

How Reliable are Confidence Estimators for Large Reasoning Models? A Systematic Benchmark on High-Stakes Domains

Add code
Jan 13, 2026
Viaarxiv icon

A Variational Approach for Mitigating Entity Bias in Relation Extraction

Add code
Jun 13, 2025
Viaarxiv icon

GenPlanX. Generation of Plans and Execution

Add code
Jun 12, 2025
Viaarxiv icon

Conservative Bias in Large Language Models: Measuring Relation Predictions

Add code
Jun 09, 2025
Viaarxiv icon

Calibrating LLM Confidence by Probing Perturbed Representation Stability

Add code
May 27, 2025
Viaarxiv icon

FinNLI: Novel Dataset for Multi-Genre Financial Natural Language Inference Benchmarking

Add code
Apr 22, 2025
Viaarxiv icon