Abstract:We present SLICK, a novel framework for precise and robust car damage segmentation that leverages structural priors and domain knowledge to tackle real-world automotive inspection challenges. SLICK introduces five key components: (1) Selective Part Segmentation using a high-resolution semantic backbone guided by structural priors to achieve surgical accuracy in segmenting vehicle parts even under occlusion, deformation, or paint loss; (2) Localization-Aware Attention blocks that dynamically focus on damaged regions, enhancing fine-grained damage detection in cluttered and complex street scenes; (3) an Instance-Sensitive Refinement head that leverages panoptic cues and shape priors to disentangle overlapping or adjacent parts, enabling precise boundary alignment; (4) Cross-Channel Calibration through multi-scale channel attention that amplifies subtle damage signals such as scratches and dents while suppressing noise like reflections and decals; and (5) a Knowledge Fusion Module that integrates synthetic crash data, part geometry, and real-world insurance datasets to improve generalization and handle rare cases effectively. Experiments on large-scale automotive datasets demonstrate SLICK's superior segmentation performance, robustness, and practical applicability for insurance and automotive inspection workflows.
Abstract:This paper introduces ALBERT, an instance segmentation model specifically designed for comprehensive car damage and part segmentation. Leveraging the power of Bidirectional Encoder Representations, ALBERT incorporates advanced localization mechanisms to accurately identify and differentiate between real and fake damages, as well as segment individual car parts. The model is trained on a large-scale, richly annotated automotive dataset that categorizes damage into 26 types, identifies 7 fake damage variants, and segments 61 distinct car parts. Our approach demonstrates strong performance in both segmentation accuracy and damage classification, paving the way for intelligent automotive inspection and assessment applications.
Abstract:Text recognition in natural images remains a challenging yet essential task, with broad applications spanning computer vision and natural language processing. This paper introduces a novel end-to-end framework that combines ResNet and Vision Transformer backbones with advanced methodologies, including Deformable Convolutions, Retrieval-Augmented Generation, and Conditional Random Fields (CRF). These innovations collectively enhance feature representation and improve Optical Character Recognition (OCR) performance. Specifically, the framework substitutes standard convolution layers in the third and fourth blocks with Deformable Convolutions, leverages adaptive dropout for regularization, and incorporates CRF for more refined sequence modeling. Extensive experiments conducted on six benchmark datasets IC13, IC15, SVT, IIIT5K, SVTP, and CUTE80 validate the proposed method's efficacy, achieving notable accuracies: 97.32% on IC13, 58.26% on IC15, 88.10% on SVT, 74.13% on IIIT5K, 82.17% on SVTP, and 66.67% on CUTE80, resulting in an average accuracy of 77.77%. These results establish a new state-of-the-art for text recognition, demonstrating the robustness of the approach across diverse and challenging datasets.
Abstract:This paper introduces a novel framework for detecting and segmenting critical road assets on Thai highways using an advanced Refined Generalized Focal Loss (REG) formulation. Integrated into state-of-the-art vision-based detection and segmentation models, the proposed method effectively addresses class imbalance and the challenges of localizing small, underrepresented road elements, including pavilions, pedestrian bridges, information signs, single-arm poles, bus stops, warning signs, and concrete guardrails. To improve both detection and segmentation accuracy, a multi-task learning strategy is adopted, optimizing REG across multiple tasks. REG is further enhanced by incorporating a spatial-contextual adjustment term, which accounts for the spatial distribution of road assets, and a probabilistic refinement that captures prediction uncertainty in complex environments, such as varying lighting conditions and cluttered backgrounds. Our rigorous mathematical formulation demonstrates that REG minimizes localization and classification errors by applying adaptive weighting to hard-to-detect instances while down-weighting easier examples. Experimental results show a substantial performance improvement, achieving a mAP50 of 80.34 and an F1-score of 77.87, significantly outperforming conventional methods. This research underscores the capability of advanced loss function refinements to enhance the robustness and accuracy of road asset detection and segmentation, thereby contributing to improved road safety and infrastructure management. For an in-depth discussion of the mathematical background and related methods, please refer to previous work available at \url{https://github.com/kaopanboonyuen/REG}.
Abstract:Evaluating car damages from misfortune is critical to the car insurance industry. However, the accuracy is still insufficient for real-world applications since the deep learning network is not designed for car damage images as inputs, and its segmented masks are still very coarse. This paper presents MARS (Mask Attention Refinement with Sequential quadtree nodes) for car damage instance segmentation. Our MARS represents self-attention mechanisms to draw global dependencies between the sequential quadtree nodes layer and quadtree transformer to recalibrate channel weights and predict highly accurate instance masks. Our extensive experiments demonstrate that MARS outperforms state-of-the-art (SOTA) instance segmentation methods on three popular benchmarks such as Mask R-CNN [9], PointRend [14], and Mask Transfiner [13], by a large margin of +1.3 maskAP-based R50-FPN backbone and +2.3 maskAP-based R101-FPN backbone on Thai car-damage dataset. Our demos are available at https://www.marssolution.io.