Our works experimented DeepLabV3+ with different backbones on a large volume of steel images aiming to automatically detect different types of steel defects. Our methods applied random weighted augmentation to balance different defects types in the training set. And then applied DeeplabV3+ model three different backbones, ResNet, DenseNet and EfficientNet, on segmenting defection regions on the steel images. Based on experiments, we found that applying ResNet101 or EfficientNet as backbones could reach the best IoU scores on the test set, which is around 0.57, comparing with 0.325 for using DenseNet. Also, DeepLabV3+ model with ResNet101 as backbone has the fewest training time.
Besides local features, global information plays an essential role in semantic segmentation, while recent works usually fail to explicitly extract the meaningful global information and make full use of it. In this paper, we propose a SceneEncoder module to impose a scene-aware guidance to enhance the effect of global information. The module predicts a scene descriptor, which learns to represent the categories of objects existing in the scene and directly guides the point-level semantic segmentation through filtering out categories not belonging to this scene. Additionally, to alleviate segmentation noise in local region, we design a region similarity loss to propagate distinguishing features to their own neighboring points with the same label, leading to the enhancement of the distinguishing ability of point-wise features. We integrate our methods into several prevailing networks and conduct extensive experiments on benchmark datasets ScanNet and ShapeNet. Results show that our methods greatly improve the performance of baselines and achieve state-of-the-art performance.