Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Teck-Yian Lim

Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer

Aug 19, 2025

Md Ashiqur Rahman, Chiao-An Yang, Michael N. Cheng, Lim Jun Hao, Jeremiah Jiang, Teck-Yian Lim, Raymond A. Yeh

Figure 1 for Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer

Figure 2 for Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer

Figure 3 for Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer

Figure 4 for Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer

Abstract:Scale variation is a fundamental challenge in computer vision. Objects of the same class can have different sizes, and their perceived size is further affected by the distance from the camera. These variations are local to the objects, i.e., different object sizes may change differently within the same image. To effectively handle scale variations, we present a deep equilibrium canonicalizer (DEC) to improve the local scale equivariance of a model. DEC can be easily incorporated into existing network architectures and can be adapted to a pre-trained model. Notably, we show that on the competitive ImageNet benchmark, DEC improves both model performance and local scale consistency across four popular pre-trained deep-nets, e.g., ViT, DeiT, Swin, and BEiT. Our code is available at https://github.com/ashiq24/local-scale-equivariance.

Via

Access Paper or Ask Questions

Making Vision Transformers Truly Shift-Equivariant

May 25, 2023

Renan A. Rojas-Gomez, Teck-Yian Lim, Minh N. Do, Raymond A. Yeh

Figure 1 for Making Vision Transformers Truly Shift-Equivariant

Figure 2 for Making Vision Transformers Truly Shift-Equivariant

Figure 3 for Making Vision Transformers Truly Shift-Equivariant

Figure 4 for Making Vision Transformers Truly Shift-Equivariant

Abstract:For computer vision tasks, Vision Transformers (ViTs) have become one of the go-to deep net architectures. Despite being inspired by Convolutional Neural Networks (CNNs), ViTs remain sensitive to small shifts in the input image. To address this, we introduce novel designs for each of the modules in ViTs, such as tokenization, self-attention, patch merging, and positional encoding. With our proposed modules, we achieve truly shift-equivariant ViTs on four well-established models, namely, Swin, SwinV2, MViTv2, and CvT, both in theory and practice. Empirically, we tested these models on image classification and semantic segmentation, achieving competitive performance across three different datasets while maintaining 100% shift consistency.

Via

Access Paper or Ask Questions

Learnable Polyphase Sampling for Shift Invariant and Equivariant Convolutional Networks

Oct 14, 2022

Renan A. Rojas-Gomez, Teck-Yian Lim, Alexander G. Schwing, Minh N. Do, Raymond A. Yeh

Figure 1 for Learnable Polyphase Sampling for Shift Invariant and Equivariant Convolutional Networks

Figure 2 for Learnable Polyphase Sampling for Shift Invariant and Equivariant Convolutional Networks

Figure 3 for Learnable Polyphase Sampling for Shift Invariant and Equivariant Convolutional Networks

Figure 4 for Learnable Polyphase Sampling for Shift Invariant and Equivariant Convolutional Networks

Abstract:We propose learnable polyphase sampling (LPS), a pair of learnable down/upsampling layers that enable truly shift-invariant and equivariant convolutional networks. LPS can be trained end-to-end from data and generalizes existing handcrafted downsampling layers. It is widely applicable as it can be integrated into any convolutional network by replacing down/upsampling layers. We evaluate LPS on image classification and semantic segmentation. Experiments show that LPS is on-par with or outperforms existing methods in both performance and shift consistency. For the first time, we achieve true shift-equivariance on semantic segmentation (PASCAL VOC), i.e., 100% shift consistency, outperforming baselines by an absolute 3.3%.

* Accepted at the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

Via

Access Paper or Ask Questions