Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sinan Kalkan

KOVAN Research Lab, Dept. of Computer Engineering, Middle East Technical University, Ankara, Turkey

Generalized Mask-aware IoU for Anchor Assignment for Real-time Instance Segmentation

Dec 28, 2023

Barış Can Çam, Kemal Öksüz, Fehmi Kahraman, Zeynep Sonat Baltacı, Sinan Kalkan, Emre Akbaş

Abstract:This paper introduces Generalized Mask-aware Intersection-over-Union (GmaIoU) as a new measure for positive-negative assignment of anchor boxes during training of instance segmentation methods. Unlike conventional IoU measure or its variants, which only consider the proximity of anchor and ground-truth boxes; GmaIoU additionally takes into account the segmentation mask. This enables GmaIoU to provide more accurate supervision during training. We demonstrate the effectiveness of GmaIoU by replacing IoU with our GmaIoU in ATSS, a state-of-the-art (SOTA) assigner. Then, we train YOLACT, a real-time instance segmentation method, using our GmaIoU-based ATSS assigner. The resulting YOLACT based on the GmaIoU assigner outperforms (i) ATSS with IoU by $\sim 1.0-1.5$ mask AP, (ii) YOLACT with a fixed IoU threshold assigner by $\sim 1.5-2$ mask AP over different image sizes and (iii) decreases the inference time by $25 \%$ owing to using less anchors. Taking advantage of this efficiency, we further devise GmaYOLACT, a faster and $+7$ mask AP points more accurate detector than YOLACT. Our best model achieves $38.7$ mask AP at $26$ fps on COCO test-dev establishing a new state-of-the-art for real-time instance segmentation.

* 28 pages, 4 figures

Via

Access Paper or Ask Questions

Uncertainty-based Fairness Measures

Dec 18, 2023

Selim Kuzucu, Jiaee Cheong, Hatice Gunes, Sinan Kalkan

Abstract:Unfair predictions of machine learning (ML) models impede their broad acceptance in real-world settings. Tackling this arduous challenge first necessitates defining what it means for an ML model to be fair. This has been addressed by the ML community with various measures of fairness that depend on the prediction outcomes of the ML models, either at the group level or the individual level. These fairness measures are limited in that they utilize point predictions, neglecting their variances, or uncertainties, making them susceptible to noise, missingness and shifts in data. In this paper, we first show that an ML model may appear to be fair with existing point-based fairness measures but biased against a demographic group in terms of prediction uncertainties. Then, we introduce new fairness measures based on different types of uncertainties, namely, aleatoric uncertainty and epistemic uncertainty. We demonstrate on many datasets that (i) our uncertainty-based measures are complementary to existing measures of fairness, and (ii) they provide more insights about the underlying issues leading to bias.

Via

Access Paper or Ask Questions

Correlation Loss: Enforcing Correlation between Classification and Localization

Jan 03, 2023

Fehmi Kahraman, Kemal Oksuz, Sinan Kalkan, Emre Akbas

Abstract:Object detectors are conventionally trained by a weighted sum of classification and localization losses. Recent studies (e.g., predicting IoU with an auxiliary head, Generalized Focal Loss, Rank & Sort Loss) have shown that forcing these two loss terms to interact with each other in non-conventional ways creates a useful inductive bias and improves performance. Inspired by these works, we focus on the correlation between classification and localization and make two main contributions: (i) We provide an analysis about the effects of correlation between classification and localization tasks in object detectors. We identify why correlation affects the performance of various NMS-based and NMS-free detectors, and we devise measures to evaluate the effect of correlation and use them to analyze common detectors. (ii) Motivated by our observations, e.g., that NMS-free detectors can also benefit from correlation, we propose Correlation Loss, a novel plug-in loss function that improves the performance of various object detectors by directly optimizing correlation coefficients: E.g., Correlation Loss on Sparse R-CNN, an NMS-free method, yields 1.6 AP gain on COCO and 1.8 AP gain on Cityscapes dataset. Our best model on Sparse R-CNN reaches 51.0 AP without test-time augmentation on COCO test-dev, reaching state-of-the-art. Code is available at https://github.com/fehmikahraman/CorrLoss

* Accepted to AAAI 2023

Via

Access Paper or Ask Questions

AssembleRL: Learning to Assemble Furniture from Their Point Clouds

Sep 15, 2022

Özgür Aslan, Burak Bolat, Batuhan Bal, Tuğba Tümer, Erol Şahin, Sinan Kalkan

Figure 1 for AssembleRL: Learning to Assemble Furniture from Their Point Clouds

Figure 2 for AssembleRL: Learning to Assemble Furniture from Their Point Clouds

Figure 3 for AssembleRL: Learning to Assemble Furniture from Their Point Clouds

Figure 4 for AssembleRL: Learning to Assemble Furniture from Their Point Clouds

Abstract:The rise of simulation environments has enabled learning-based approaches for assembly planning, which is otherwise a labor-intensive and daunting task. Assembling furniture is especially interesting since furniture are intricate and pose challenges for learning-based approaches. Surprisingly, humans can solve furniture assembly mostly given a 2D snapshot of the assembled product. Although recent years have witnessed promising learning-based approaches for furniture assembly, they assume the availability of correct connection labels for each assembly step, which are expensive to obtain in practice. In this paper, we alleviate this assumption and aim to solve furniture assembly with as little human expertise and supervision as possible. To be specific, we assume the availability of the assembled point cloud, and comparing the point cloud of the current assembly and the point cloud of the target product, obtain a novel reward signal based on two measures: Incorrectness and incompleteness. We show that our novel reward signal can train a deep network to successfully assemble different types of furniture. Code and networks available here: https://github.com/METU-KALFA/AssembleRL

* 6 pages, 6 figures, iros2022

Via

Access Paper or Ask Questions

Segment Augmentation and Differentiable Ranking for Logo Retrieval

Sep 13, 2022

Feyza Yavuz, Sinan Kalkan

Abstract:Logo retrieval is a challenging problem since the definition of similarity is more subjective compared to image retrieval tasks and the set of known similarities is very scarce. To tackle this challenge, in this paper, we propose a simple but effective segment-based augmentation strategy to introduce artificially similar logos for training deep networks for logo retrieval. In this novel augmentation strategy, we first find segments in a logo and apply transformations such as rotation, scaling, and color change, on the segments, unlike the conventional image-level augmentation strategies. Moreover, we evaluate whether the recently introduced ranking-based loss function, Smooth-AP, is a better approach for learning similarity for logo retrieval. On the large scale METU Trademark Dataset, we show that (i) our segment-based augmentation strategy improves retrieval performance compared to the baseline model or image-level augmentation strategies, and (ii) Smooth-AP indeed performs better than conventional losses for logo retrieval.

* ICPR2022, Poster Presentation

Via

Access Paper or Ask Questions

Does depth estimation help object detection?

Apr 13, 2022

Bedrettin Cetinkaya, Sinan Kalkan, Emre Akbas

Figure 1 for Does depth estimation help object detection?

Figure 2 for Does depth estimation help object detection?

Figure 3 for Does depth estimation help object detection?

Figure 4 for Does depth estimation help object detection?

Abstract:Ground-truth depth, when combined with color data, helps improve object detection accuracy over baseline models that only use color. However, estimated depth does not always yield improvements. Many factors affect the performance of object detection when estimated depth is used. In this paper, we comprehensively investigate these factors with detailed experiments, such as using ground-truth vs. estimated depth, effects of different state-of-the-art depth estimation networks, effects of using different indoor and outdoor RGB-D datasets as training data for depth estimation, and different architectural choices for integrating depth to the base object detector network. We propose an early concatenation strategy of depth, which yields higher mAP than previous works' while using significantly fewer parameters.

* Accepted to Image and Vision Computing

Via

Access Paper or Ask Questions

Mask-aware IoU for Anchor Assignment in Real-time Instance Segmentation

Oct 19, 2021

Kemal Oksuz, Baris Can Cam, Fehmi Kahraman, Zeynep Sonat Baltaci, Sinan Kalkan, Emre Akbas

Figure 1 for Mask-aware IoU for Anchor Assignment in Real-time Instance Segmentation

Figure 2 for Mask-aware IoU for Anchor Assignment in Real-time Instance Segmentation

Figure 3 for Mask-aware IoU for Anchor Assignment in Real-time Instance Segmentation

Figure 4 for Mask-aware IoU for Anchor Assignment in Real-time Instance Segmentation

Abstract:This paper presents Mask-aware Intersection-over-Union (maIoU) for assigning anchor boxes as positives and negatives during training of instance segmentation methods. Unlike conventional IoU or its variants, which only considers the proximity of two boxes; maIoU consistently measures the proximity of an anchor box with not only a ground truth box but also its associated ground truth mask. Thus, additionally considering the mask, which, in fact, represents the shape of the object, maIoU enables a more accurate supervision during training. We present the effectiveness of maIoU on a state-of-the-art (SOTA) assigner, ATSS, by replacing IoU operation by our maIoU and training YOLACT, a SOTA real-time instance segmentation method. Using ATSS with maIoU consistently outperforms (i) ATSS with IoU by $\sim 1$ mask AP, (ii) baseline YOLACT with fixed IoU threshold assigner by $\sim 2$ mask AP over different image sizes and (iii) decreases the inference time by $25 \%$ owing to using less anchors. Then, exploiting this efficiency, we devise maYOLACT, a faster and $+6$ AP more accurate detector than YOLACT. Our best model achieves $37.7$ mask AP at $25$ fps on COCO test-dev establishing a new state-of-the-art for real-time instance segmentation. Code is available at https://github.com/kemaloksuz/Mask-aware-IoU

* BMVC 2021, camera ready version

Via

Access Paper or Ask Questions

Rank & Sort Loss for Object Detection and Instance Segmentation

Jul 24, 2021

Kemal Oksuz, Baris Can Cam, Emre Akbas, Sinan Kalkan

Figure 1 for Rank & Sort Loss for Object Detection and Instance Segmentation

Figure 2 for Rank & Sort Loss for Object Detection and Instance Segmentation

Figure 3 for Rank & Sort Loss for Object Detection and Instance Segmentation

Figure 4 for Rank & Sort Loss for Object Detection and Instance Segmentation

Abstract:We propose Rank & Sort (RS) Loss, as a ranking-based loss function to train deep object detection and instance segmentation methods (i.e. visual detectors). RS Loss supervises the classifier, a sub-network of these methods, to rank each positive above all negatives as well as to sort positives among themselves with respect to (wrt.) their continuous localisation qualities (e.g. Intersection-over-Union - IoU). To tackle the non-differentiable nature of ranking and sorting, we reformulate the incorporation of error-driven update with backpropagation as Identity Update, which enables us to model our novel sorting error among positives. With RS Loss, we significantly simplify training: (i) Thanks to our sorting objective, the positives are prioritized by the classifier without an additional auxiliary head (e.g. for centerness, IoU, mask-IoU), (ii) due to its ranking-based nature, RS Loss is robust to class imbalance, and thus, no sampling heuristic is required, and (iii) we address the multi-task nature of visual detectors using tuning-free task-balancing coefficients. Using RS Loss, we train seven diverse visual detectors only by tuning the learning rate, and show that it consistently outperforms baselines: e.g. our RS Loss improves (i) Faster R-CNN by ~ 3 box AP and aLRP Loss (ranking-based baseline) by ~ 2 box AP on COCO dataset, (ii) Mask R-CNN with repeat factor sampling (RFS) by 3.5 mask AP (~ 7 AP for rare classes) on LVIS dataset; and also outperforms all counterparts. Code available at https://github.com/kemaloksuz/RankSortLoss

* ICCV 2021, oral presentation

Via

Access Paper or Ask Questions

One Metric to Measure them All: Localisation Recall Precision for Evaluating Visual Detection Tasks

Nov 21, 2020

Kemal Oksuz, Baris Can Cam, Sinan Kalkan, Emre Akbas

Figure 1 for One Metric to Measure them All: Localisation Recall Precision for Evaluating Visual Detection Tasks

Figure 2 for One Metric to Measure them All: Localisation Recall Precision for Evaluating Visual Detection Tasks

Figure 3 for One Metric to Measure them All: Localisation Recall Precision for Evaluating Visual Detection Tasks

Figure 4 for One Metric to Measure them All: Localisation Recall Precision for Evaluating Visual Detection Tasks

Abstract:Despite being widely used as a performance measure for visual detection tasks, Average Precision (AP) is limited in (i) including localisation quality, (ii) interpretability and (iii) applicability to outputs without confidence scores. Panoptic Quality (PQ), a measure proposed for evaluating panoptic segmentation (Kirillov et al., 2019), does not suffer from these limitations but is limited to panoptic segmentation. In this paper, we propose Localisation Recall Precision (LRP) Error as the performance measure for all visual detection tasks. LRP Error, initially proposed only for object detection by Oksuz et al. (2018), does not suffer from the aforementioned limitations and is applicable to all visual detection tasks. We also introduce Optimal LRP (oLRP) Error as the minimum LRP error obtained over confidence scores to evaluate visual detectors and obtain optimal thresholds for deployment. We provide a detailed comparative analysis of LRP with AP and PQ, and use 35 state-of-the-art visual detectors from four common visual detection tasks (i.e. object detection, keypoint detection, instance segmentation and panoptic segmentation) to empirically show that LRP provides richer and more discriminative information than its counterparts. Code available at: https://github.com/kemaloksuz/LRP-Error

* Under review at TPAMI

Via

Access Paper or Ask Questions

Spatio-Temporal Analysis of Facial Actions using Lifecycle-Aware Capsule Networks

Nov 17, 2020

Nikhil Churamani, Sinan Kalkan, Hatice Gunes

Abstract:Most state-of-the-art approaches for Facial Action Unit (AU) detection rely upon evaluating facial expressions from static frames, encoding a snapshot of heightened facial activity. In real-world interactions, however, facial expressions are usually more subtle and evolve in a temporal manner requiring AU detection models to learn spatial as well as temporal information. In this paper, we focus on both spatial and spatio-temporal features encoding the temporal evolution of facial AU activation. For this purpose, we propose the Action Unit Lifecycle-Aware Capsule Network (AULA-Caps) that performs AU detection using both frame and sequence-level features. While at the frame-level the capsule layers of AULA-Caps learn spatial feature primitives to determine AU activations, at the sequence-level, it learns temporal dependencies between contiguous frames by focusing on relevant spatio-temporal segments in the sequence. The learnt feature capsules are routed together such that the model learns to selectively focus more on spatial or spatio-temporal information depending upon the AU lifecycle. The proposed model is evaluated on the commonly used BP4D and GFT benchmark datasets obtaining state-of-the-art results on both the datasets.

Via

Access Paper or Ask Questions