Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ajay Agarwal

Mechanistically Interpreting Compression in Vision-Language Models

Mar 26, 2026

Veeraraju Elluru, Arth Singh, Roberto Aguero, Ajay Agarwal, Debojyoti Das, Hreetam Paul

Abstract:Compressed vision-language models (VLMs) are widely used to reduce memory and compute costs, making them a suitable choice for real-world deployment. However, compressing these models raises concerns about whether internal computations and safety behaviors are preserved. In this work, we use causal circuit analysis and crosscoder-based feature comparisons to examine how pruning and quantization fundamentally change the internals across representative VLMs. We observe that pruning generally keeps circuit structure intact but rotates and attenuates internal features, while quantization modifies the circuits at a higher level yet leaves the surviving features better aligned. Leveraging this insight, we also introduce VLMSafe-420, a novel benchmark that pairs harmful inputs with matched benign counterfactuals across various safety categories. Our findings show that pruning causes a sharp drop in genuine refusal behavior, suggesting that the choice of compression has safety implications.

* 15 pages, 7 figures, 12 tables

Via

Access Paper or Ask Questions

Tracking Peaceful Tractors on Social Media -- XAI-enabled analysis of Red Fort Riots 2021

Apr 24, 2021

Ajay Agarwal

Figure 1 for Tracking Peaceful Tractors on Social Media -- XAI-enabled analysis of Red Fort Riots 2021

Figure 2 for Tracking Peaceful Tractors on Social Media -- XAI-enabled analysis of Red Fort Riots 2021

Figure 3 for Tracking Peaceful Tractors on Social Media -- XAI-enabled analysis of Red Fort Riots 2021

Figure 4 for Tracking Peaceful Tractors on Social Media -- XAI-enabled analysis of Red Fort Riots 2021

Abstract:On 26 January 2021, India witnessed a national embarrassment from the demographic least expected from - farmers. People across the nation watched in horror as a pseudo-patriotic mob of farmers stormed capital Delhi and vandalized the national pride- Red Fort. Investigations that followed the event revealed the existence of a social media trail that led to the likes of such an event. Consequently, it became essential and necessary to archive this trail for social media analysis - not only to understand the bread-crumbs that are dispersed across the trail but also to visualize the role played by misinformation and fake news in this event. In this paper, we propose the tractor2twitter dataset which contains around 0.05 million tweets that were posted before, during, and after this event. Also, we benchmark our dataset with an Explainable AI ML model for classification of each tweet into either of the three categories - disinformation, misinformation, and opinion.

Via

Access Paper or Ask Questions