Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhijiang Zhang

STENet: Superpixel Token Enhancing Network for RGB-D Salient Object Detection

Mar 23, 2026

Jianlin Chen, Gongyang Li, Zhijiang Zhang, Liang Chang, Dan Zeng

Abstract:Transformer-based methods for RGB-D Salient Object Detection (SOD) have gained significant interest, owing to the transformer's exceptional capacity to capture long-range pixel dependencies. Nevertheless, current RGB-D SOD methods face challenges, such as the quadratic complexity of the attention mechanism and the limited local detail extraction. To overcome these limitations, we propose a novel Superpixel Token Enhancing Network (STENet), which introduces superpixels into cross-modal interaction. STENet follows the two-stream encoder-decoder structure. Its cores are two tailored superpixel-driven cross-modal interaction modules, responsible for global and local feature enhancement. Specifically, we update the superpixel generation method by expanding the neighborhood range of each superpixel, allowing for flexible transformation between pixels and superpixels. With the updated superpixel generation method, we first propose the Superpixel Attention Global Enhancing Module to model the global pixel-to-superpixel relationship rather than the traditional global pixel-to-pixel relationship, which can capture region-level information and reduce computational complexity. We also propose the Superpixel Attention Local Refining Module, which leverages pixel similarity within superpixels to filter out a subset of pixels (i.e., local pixels) and then performs feature enhancement on these local pixels, thereby capturing concerned local details. Furthermore, we fuse the globally and locally enhanced features along with the cross-scale features to achieve comprehensive feature representation. Experiments on seven RGB-D SOD datasets reveal that our STENet achieves competitive performance compared to state-of-the-art methods. The code and results of our method are available at https://github.com/Mark9010/STENet.

* 12 pages, 8 figures, accepted by IEEE TMM

Via

Access Paper or Ask Questions

DefFiller: Mask-Conditioned Diffusion for Salient Steel Surface Defect Generation

Dec 20, 2024

Yichun Tai, Zhenzhen Huang, Tao Peng, Zhijiang Zhang

Figure 1 for DefFiller: Mask-Conditioned Diffusion for Salient Steel Surface Defect Generation

Figure 2 for DefFiller: Mask-Conditioned Diffusion for Salient Steel Surface Defect Generation

Figure 3 for DefFiller: Mask-Conditioned Diffusion for Salient Steel Surface Defect Generation

Figure 4 for DefFiller: Mask-Conditioned Diffusion for Salient Steel Surface Defect Generation

Abstract:Current saliency-based defect detection methods show promise in industrial settings, but the unpredictability of defects in steel production environments complicates dataset creation, hampering model performance. Existing data augmentation approaches using generative models often require pixel-level annotations, which are time-consuming and resource-intensive. To address this, we introduce DefFiller, a mask-conditioned defect generation method that leverages a layout-to-image diffusion model. DefFiller generates defect samples paired with mask conditions, eliminating the need for pixel-level annotations and enabling direct use in model training. We also develop an evaluation framework to assess the quality of generated samples and their impact on detection performance. Experimental results on the SD-Saliency-900 dataset demonstrate that DefFiller produces high-quality defect images that accurately match the provided mask conditions, significantly enhancing the performance of saliency-based defect detection models trained on the augmented dataset.

* 20 pages, 10 figures

Via

Access Paper or Ask Questions

Defect Image Sample Generation With Diffusion Prior for Steel Surface Defect Recognition

May 03, 2024

Yichun Tai, Kun Yang, Tao Peng, Zhenzhen Huang, Zhijiang Zhang

Figure 1 for Defect Image Sample Generation With Diffusion Prior for Steel Surface Defect Recognition

Figure 2 for Defect Image Sample Generation With Diffusion Prior for Steel Surface Defect Recognition

Figure 3 for Defect Image Sample Generation With Diffusion Prior for Steel Surface Defect Recognition

Figure 4 for Defect Image Sample Generation With Diffusion Prior for Steel Surface Defect Recognition

Abstract:The task of steel surface defect recognition is an industrial problem with great industry values. The data insufficiency is the major challenge in training a robust defect recognition network. Existing methods have investigated to enlarge the dataset by generating samples with generative models. However, their generation quality is still limited by the insufficiency of defect image samples. To this end, we propose Stable Surface Defect Generation (StableSDG), which transfers the vast generation distribution embedded in Stable Diffusion model for steel surface defect image generation. To tackle with the distinctive distribution gap between steel surface images and generated images of the diffusion model, we propose two processes. First, we align the distribution by adapting parameters of the diffusion model, adopted both in the token embedding space and network parameter space. Besides, in the generation process, we propose image-oriented generation rather than from pure Gaussian noises. We conduct extensive experiments on steel surface defect dataset, demonstrating state-of-the-art performance on generating high-quality samples and training recognition models, and both designed processes are significant for the performance.

Via

Access Paper or Ask Questions

BARS: A Benchmark for Airport Runway Segmentation

Oct 24, 2022

Wenhui Chen, Zhijiang Zhang, Liang Yu, Yichun Tai

Abstract:Airport runway segmentation can effectively reduce the accident rate during the landing phase, which has the largest risk of flight accidents. With the rapid development of deep learning, related methods have good performance on segmentation tasks and can be well adapted to complex scenes. However, the lack of large-scale, publicly available datasets in this field makes the development of methods based on deep learning difficult. Therefore, we propose a Benchmark for Airport Runway Segmentation, named BARS. Meanwhile, a semi-automatic annotation pipeline is designed to reduce the workload of annotation. BARS has the largest dataset with the richest categories and the only instance annotation in the field. The dataset, which is collected using the X-Plane simulation platform, contains 10,002 images and 29,347 instances with three categories. We evaluate eight representative instance segmentation methods on BARS and analyze their performance. Based on the characteristic of the airport runway with a regular shape, we propose a plug-and-play smoothing post-processing module (SPPM) and a contour point constraint loss (CPCL) function to smooth segmentation results for mask-based and contour-based methods, respectively. Furthermore, a novel evaluation metric named average smoothness (AS) is developed to measure smoothness. The experiments show that existing instance segmentation methods can achieve prediction results with good performance on BARS. SPPM and CPCL can improve the average accuracy by 0.9% and 1.13%, respectively. And the average smoothness enhancements for SPPM and CPCL are more than 50% and 28%, respectively. Our work will be released at https://github.com/c-wenhui/BARS.

* 14pages,8 figures, 4 tables

Via

Access Paper or Ask Questions

Object Skeleton Extraction in Natural Images by Fusing Scale-associated Deep Side Outputs

Apr 06, 2016

Wei Shen, Kai Zhao, Yuan Jiang, Yan Wang, Zhijiang Zhang, Xiang Bai

Figure 1 for Object Skeleton Extraction in Natural Images by Fusing Scale-associated Deep Side Outputs

Figure 2 for Object Skeleton Extraction in Natural Images by Fusing Scale-associated Deep Side Outputs

Figure 3 for Object Skeleton Extraction in Natural Images by Fusing Scale-associated Deep Side Outputs

Figure 4 for Object Skeleton Extraction in Natural Images by Fusing Scale-associated Deep Side Outputs

Abstract:Object skeleton is a useful cue for object detection, complementary to the object contour, as it provides a structural representation to describe the relationship among object parts. While object skeleton extraction in natural images is a very challenging problem, as it requires the extractor to be able to capture both local and global image context to determine the intrinsic scale of each skeleton pixel. Existing methods rely on per-pixel based multi-scale feature computation, which results in difficult modeling and high time consumption. In this paper, we present a fully convolutional network with multiple scale-associated side outputs to address this problem. By observing the relationship between the receptive field sizes of the sequential stages in the network and the skeleton scales they can capture, we introduce a scale-associated side output to each stage. We impose supervision to different stages by guiding the scale-associated side outputs toward groundtruth skeletons of different scales. The responses of the multiple scale-associated side outputs are then fused in a scale-specific way to localize skeleton pixels with multiple scales effectively. Our method achieves promising results on two skeleton extraction datasets, and significantly outperforms other competitors.

* Accepted by CVPR2016

Via

Access Paper or Ask Questions