Abstract:Occlusion remains a significant challenge for current vision models to robustly interpret complex and dense real-world images and scenes. To address this limitation and to enable accurate prediction of the occlusion order relationship between objects, we propose leveraging the advanced capability of a pre-trained GPT-4 model to deduce the order. By providing a specifically designed prompt along with the input image, GPT-4 can analyze the image and generate order predictions. The response can then be parsed to construct an occlusion matrix which can be utilized in assisting with other occlusion handling tasks and image understanding. We report the results of evaluating the model on COCOA and InstaOrder datasets. The results show that by using semantic context, visual patterns, and commonsense knowledge, the model can produce more accurate order predictions. Unlike baseline methods, the model can reason about occlusion relationships in a zero-shot fashion, which requires no annotated training data and can easily be integrated into occlusion handling frameworks.
Abstract:We present a model to reconstruct partially visible objects. The model takes a mask as an input, which we call weighted mask. The mask is utilized by gated convolutions to assign more weight to the visible pixels of the occluded instance compared to the background, while ignoring the features of the invisible pixels. By drawing more attention from the visible region, our model can predict the invisible patch more effectively than the baseline models, especially in instances with uniform texture. The model is trained on COCOA dataset and two subsets of it in a self-supervised manner. The results demonstrate that our model generates higher quality and more texture-rich outputs compared to baseline models. Code is available at: https://github.com/KaziwaSaleh/mask-guided.
Abstract:The significant power of deep learning networks has led to enormous development in object detection. Over the last few years, object detector frameworks have achieved tremendous success in both accuracy and efficiency. However, their ability is far from that of human beings due to several factors, occlusion being one of them. Since occlusion can happen in various locations, scale, and ratio, it is very difficult to handle. In this paper, we address the challenges in occlusion handling in generic object detection in both outdoor and indoor scenes, then we refer to the recent works that have been carried out to overcome these challenges. Finally, we discuss some possible future directions of research.