Abstract:While foundation models drive steady progress in image segmentation and diffusion algorithms compose always more realistic images, the seemingly simple problem of identifying recurrent patterns in a collection of images remains very much open. In this paper, we focus on sprite-based image decomposition models, which have shown some promise for clustering and image decomposition and are appealing because of their high interpretability. These models come in different flavors, need to be tailored to specific datasets, and struggle to scale to images with many objects. We dive into the details of their design, identify their core components, and perform an extensive analysis on clustering benchmarks. We leverage this analysis to propose a deep sprite-based image decomposition method that performs on par with state-of-the-art unsupervised class-aware image segmentation methods on the standard CLEVR benchmark, scales linearly with the number of objects, identifies explicitly object categories, and fully models images in an easily interpretable way.
Abstract:Imbalance in classification tasks is commonly quantified by the cardinalities of examples across classes. This, however, disregards the presence of redundant examples and inherent differences in the learning difficulties of classes. Alternatively, one can use complex measures such as training loss and uncertainty, which, however, depend on training a machine learning model. Our paper proposes using data Intrinsic Dimensionality (ID) as an easy-to-compute, model-free measure of imbalance that can be seamlessly incorporated into various imbalance mitigation methods. Our results across five different datasets with a diverse range of imbalance ratios show that ID consistently outperforms cardinality-based re-weighting and re-sampling techniques used in the literature. Moreover, we show that combining ID with cardinality can further improve performance. Code: https://github.com/cagries/IDIM.




Abstract:This paper introduces Generalized Mask-aware Intersection-over-Union (GmaIoU) as a new measure for positive-negative assignment of anchor boxes during training of instance segmentation methods. Unlike conventional IoU measure or its variants, which only consider the proximity of anchor and ground-truth boxes; GmaIoU additionally takes into account the segmentation mask. This enables GmaIoU to provide more accurate supervision during training. We demonstrate the effectiveness of GmaIoU by replacing IoU with our GmaIoU in ATSS, a state-of-the-art (SOTA) assigner. Then, we train YOLACT, a real-time instance segmentation method, using our GmaIoU-based ATSS assigner. The resulting YOLACT based on the GmaIoU assigner outperforms (i) ATSS with IoU by $\sim 1.0-1.5$ mask AP, (ii) YOLACT with a fixed IoU threshold assigner by $\sim 1.5-2$ mask AP over different image sizes and (iii) decreases the inference time by $25 \%$ owing to using less anchors. Taking advantage of this efficiency, we further devise GmaYOLACT, a faster and $+7$ mask AP points more accurate detector than YOLACT. Our best model achieves $38.7$ mask AP at $26$ fps on COCO test-dev establishing a new state-of-the-art for real-time instance segmentation.