Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ajay Narayanan Sridhar

When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise

May 06, 2026

Philip Wootaek Shin, Ajay Narayanan Sridhar, Sivani Devarapalli, Rui Zhang, Jack Sampson, Vijaykrishnan Narayanan

Abstract:Vision-language models (VLMs) achieve strong multimodal performance but remain prone to relation hallucination, which requires accurate reasoning over inter-object interactions. We study the impact of visual perturbations, specifically rotation and noise, and show that even mild distortions significantly degrade relational reasoning across models and datasets. We further evaluate prompt-based augmentation and preprocessing strategies (orientation correction and denoising), finding that while they offer partial improvements, they do not fully resolve hallucinations. Our results reveal a gap between perceptual robustness and relational understanding, highlighting the need for more robust, geometry-aware VLMs.

Via

Access Paper or Ask Questions

Evaluating Large Language Models on Rare Disease Diagnosis: A Case Study using House M.D

Nov 14, 2025

Arsh Gupta, Ajay Narayanan Sridhar, Bonam Mingole, Amulya Yadav

Figure 1 for Evaluating Large Language Models on Rare Disease Diagnosis: A Case Study using House M.D

Figure 2 for Evaluating Large Language Models on Rare Disease Diagnosis: A Case Study using House M.D

Figure 3 for Evaluating Large Language Models on Rare Disease Diagnosis: A Case Study using House M.D

Figure 4 for Evaluating Large Language Models on Rare Disease Diagnosis: A Case Study using House M.D

Abstract:Large language models (LLMs) have demonstrated capabilities across diverse domains, yet their performance on rare disease diagnosis from narrative medical cases remains underexplored. We introduce a novel dataset of 176 symptom-diagnosis pairs extracted from House M.D., a medical television series validated for teaching rare disease recognition in medical education. We evaluate four state-of-the-art LLMs such as GPT 4o mini, GPT 5 mini, Gemini 2.5 Flash, and Gemini 2.5 Pro on narrative-based diagnostic reasoning tasks. Results show significant variation in performance, ranging from 16.48% to 38.64% accuracy, with newer model generations demonstrating a 2.3 times improvement. While all models face substantial challenges with rare disease diagnosis, the observed improvement across architectures suggests promising directions for future development. Our educationally validated benchmark establishes baseline performance metrics for narrative medical reasoning and provides a publicly accessible evaluation framework for advancing AI-assisted diagnosis research.

Via

Access Paper or Ask Questions