Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chaeyun Kim

Towards Motion-aware Referring Image Segmentation

Mar 18, 2026

Chaeyun Kim, Seunghoon Yi, Yejin Kim, Yohan Jo, Joonseok Lee

Abstract:Referring Image Segmentation (RIS) requires identifying objects from images based on textual descriptions. We observe that existing methods significantly underperform on motion-related queries compared to appearance-based ones. To address this, we first introduce an efficient data augmentation scheme that extracts motion-centric phrases from original captions, exposing models to more motion expressions without additional annotations. Second, since the same object can be described differently depending on the context, we propose Multimodal Radial Contrastive Learning (MRaCL), performed on fused image-text embeddings rather than unimodal representations. For comprehensive evaluation, we introduce a new test split focusing on motion-centric queries, and introduce a new benchmark called M-Bench, where objects are distinguished primarily by actions. Extensive experiments show our method substantially improves performance on motion-centric queries across multiple RIS models, maintaining competitive results on appearance-based descriptions. Codes are available at https://github.com/snuviplab/MRaCL

* Accepted at AISTATS 2026. * Equal contribution

Via

Access Paper or Ask Questions

Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation

Nov 03, 2024

Seongsu Ha, Chaeyun Kim, Donghwa Kim, Junho Lee, Sangho Lee, Joonseok Lee

Figure 1 for Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation

Figure 2 for Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation

Figure 3 for Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation

Figure 4 for Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation

Abstract:Referring Image Segmentation is a comprehensive task to segment an object referred by a textual query from an image. In nature, the level of difficulty in this task is affected by the existence of similar objects and the complexity of the referring expression. Recent RIS models still show a significant performance gap between easy and hard scenarios. We pose that the bottleneck exists in the data, and propose a simple but powerful data augmentation method, Negative-mined Mosaic Augmentation (NeMo). This method augments a training image into a mosaic with three other negative images carefully curated by a pretrained multimodal alignment model, e.g., CLIP, to make the sample more challenging. We discover that it is critical to properly adjust the difficulty level, neither too ambiguous nor too trivial. The augmented training data encourages the RIS model to recognize subtle differences and relationships between similar visual entities and to concretely understand the whole expression to locate the right target better. Our approach shows consistent improvements on various datasets and models, verified by extensive experiments.

* Accepted at ECCV 2024. Project page: https://dddonghwa.github.io/NeMo/

Via

Access Paper or Ask Questions