Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jaisidh Singh

Explaining Grokking in Transformers through the Lens of Inductive Bias

Feb 06, 2026

Jaisidh Singh, Diganta Misra, Antonio Orvieto

Abstract:We investigate grokking in transformers through the lens of inductive bias: dispositions arising from architecture or optimization that let the network prefer one solution over another. We first show that architectural choices such as the position of Layer Normalization (LN) strongly modulates grokking speed. This modulation is explained by isolating how LN on specific pathways shapes shortcut-learning and attention entropy. Subsequently, we study how different optimization settings modulate grokking, inducing distinct interpretations of previously proposed controls such as readout scale. Particularly, we find that using readout scale as a control for lazy training can be confounded by learning rate and weight decay in our setting. Accordingly, we show that features evolve continuously throughout training, suggesting grokking in transformers can be more nuanced than a lazy-to-rich transition of the learning regime. Finally, we show how generalization predictably emerges with feature compressibility in grokking, across different modulators of inductive bias. Our code is released at https://tinyurl.com/y52u3cad.

* Total 15 pages, 9 figures

Via

Access Paper or Ask Questions

Automatic Discovery and Assessment of Interpretable Systematic Errors in Semantic Segmentation

Nov 16, 2024

Jaisidh Singh, Sonam Singh, Amit Arvind Kale, Harsh K Gandhi

Figure 1 for Automatic Discovery and Assessment of Interpretable Systematic Errors in Semantic Segmentation

Figure 2 for Automatic Discovery and Assessment of Interpretable Systematic Errors in Semantic Segmentation

Figure 3 for Automatic Discovery and Assessment of Interpretable Systematic Errors in Semantic Segmentation

Figure 4 for Automatic Discovery and Assessment of Interpretable Systematic Errors in Semantic Segmentation

Abstract:This paper presents a novel method for discovering systematic errors in segmentation models. For instance, a systematic error in the segmentation model can be a sufficiently large number of misclassifications from the model as a parking meter for a target class of pedestrians. With the rapid deployment of these models in critical applications such as autonomous driving, it is vital to detect and interpret these systematic errors. However, the key challenge is automatically discovering such failures on unlabelled data and forming interpretable semantic sub-groups for intervention. For this, we leverage multimodal foundation models to retrieve errors and use conceptual linkage along with erroneous nature to study the systematic nature of these errors. We demonstrate that such errors are present in SOTA segmentation models (UperNet ConvNeXt and UperNet Swin) trained on the Berkeley Deep Drive and benchmark the approach qualitatively and quantitatively, showing its effectiveness by discovering coherent systematic errors for these models. Our work opens up the avenue to model analysis and intervention that have so far been underexplored in semantic segmentation.

* 7 pages main paper (without references), total 13 pages & 9 figures

Via

Access Paper or Ask Questions

Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations

Mar 29, 2024

Jaisidh Singh, Ishaan Shrivastava, Mayank Vatsa, Richa Singh, Aparna Bharati

Figure 1 for Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations

Figure 2 for Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations

Figure 3 for Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations

Figure 4 for Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations

Abstract:Existing vision-language models (VLMs) treat text descriptions as a unit, confusing individual concepts in a prompt and impairing visual semantic matching and reasoning. An important aspect of reasoning in logic and language is negations. This paper highlights the limitations of popular VLMs such as CLIP, at understanding the implications of negations, i.e., the effect of the word "not" in a given prompt. To enable evaluation of VLMs on fluent prompts with negations, we present CC-Neg, a dataset containing 228,246 images, true captions and their corresponding negated captions. Using CC-Neg along with modifications to the contrastive loss of CLIP, our proposed CoN-CLIP framework, has an improved understanding of negations. This training paradigm improves CoN-CLIP's ability to encode semantics reliably, resulting in 3.85% average gain in top-1 accuracy for zero-shot image classification across 8 datasets. Further, CoN-CLIP outperforms CLIP on challenging compositionality benchmarks such as SugarCREPE by 4.4%, showcasing emergent compositional understanding of objects, relations, and attributes in text. Overall, our work addresses a crucial limitation of VLMs by introducing a dataset and framework that strengthens semantic associations between images and text, demonstrating improved large-scale foundation models with significantly reduced computational cost, promoting efficiency and accessibility.

* 14 pages + 6 figures in main manuscript (excluding references)

Via

Access Paper or Ask Questions