Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Philip Wootaek Shin

When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise

May 06, 2026

Philip Wootaek Shin, Ajay Narayanan Sridhar, Sivani Devarapalli, Rui Zhang, Jack Sampson, Vijaykrishnan Narayanan

Abstract:Vision-language models (VLMs) achieve strong multimodal performance but remain prone to relation hallucination, which requires accurate reasoning over inter-object interactions. We study the impact of visual perturbations, specifically rotation and noise, and show that even mild distortions significantly degrade relational reasoning across models and datasets. We further evaluate prompt-based augmentation and preprocessing strategies (orientation correction and denoising), finding that while they offer partial improvements, they do not fully resolve hallucinations. Our results reveal a gap between perceptual robustness and relational understanding, highlighting the need for more robust, geometry-aware VLMs.

Via

Access Paper or Ask Questions

Disharmony: Forensics using Reverse Lighting Harmonization

Jan 17, 2025

Philip Wootaek Shin, Jack Sampson, Vijaykrishnan Narayanan, Andres Marquez, Mahantesh Halappanavar

Figure 1 for Disharmony: Forensics using Reverse Lighting Harmonization

Figure 2 for Disharmony: Forensics using Reverse Lighting Harmonization

Figure 3 for Disharmony: Forensics using Reverse Lighting Harmonization

Figure 4 for Disharmony: Forensics using Reverse Lighting Harmonization

Abstract:Content generation and manipulation approaches based on deep learning methods have seen significant advancements, leading to an increased need for techniques to detect whether an image has been generated or edited. Another area of research focuses on the insertion and harmonization of objects within images. In this study, we explore the potential of using harmonization data in conjunction with a segmentation model to enhance the detection of edited image regions. These edits can be either manually crafted or generated using deep learning methods. Our findings demonstrate that this approach can effectively identify such edits. Existing forensic models often overlook the detection of harmonized objects in relation to the background, but our proposed Disharmony Network addresses this gap. By utilizing an aggregated dataset of harmonization techniques, our model outperforms existing forensic networks in identifying harmonized objects integrated into their backgrounds, and shows potential for detecting various forms of edits, including virtual try-on tasks.

Via

Access Paper or Ask Questions

Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models

Jun 09, 2024

Philip Wootaek Shin, Jihyun Janice Ahn, Wenpeng Yin, Jack Sampson, Vijaykrishnan Narayanan

Figure 1 for Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models

Figure 2 for Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models

Figure 3 for Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models

Figure 4 for Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models

Abstract:It has been shown that many generative models inherit and amplify societal biases. To date, there is no uniform/systematic agreed standard to control/adjust for these biases. This study examines the presence and manipulation of societal biases in leading text-to-image models: Stable Diffusion, DALL-E 3, and Adobe Firefly. Through a comprehensive analysis combining base prompts with modifiers and their sequencing, we uncover the nuanced ways these AI technologies encode biases across gender, race, geography, and region/culture. Our findings reveal the challenges and potential of prompt engineering in controlling biases, highlighting the critical need for ethical AI development promoting diversity and inclusivity. This work advances AI ethics by not only revealing the nuanced dynamics of bias in text-to-image generation models but also by offering a novel framework for future research in controlling bias. Our contributions-panning comparative analyses, the strategic use of prompt modifiers, the exploration of prompt sequencing effects, and the introduction of a bias sensitivity taxonomy-lay the groundwork for the development of common metrics and standard analyses for evaluating whether and how future AI models exhibit and respond to requests to adjust for inherent biases.

Via

Access Paper or Ask Questions