Alert button
Picture for Tianfang Zhang

Tianfang Zhang

Alert button

RPCANet: Deep Unfolding RPCA Based Infrared Small Target Detection

Nov 02, 2023
Fengyi Wu, Tianfang Zhang, Lei Li, Yian Huang, Zhenming Peng

Deep learning (DL) networks have achieved remarkable performance in infrared small target detection (ISTD). However, these structures exhibit a deficiency in interpretability and are widely regarded as black boxes, as they disregard domain knowledge in ISTD. To alleviate this issue, this work proposes an interpretable deep network for detecting infrared dim targets, dubbed RPCANet. Specifically, our approach formulates the ISTD task as sparse target extraction, low-rank background estimation, and image reconstruction in a relaxed Robust Principle Component Analysis (RPCA) model. By unfolding the iterative optimization updating steps into a deep-learning framework, time-consuming and complex matrix calculations are replaced by theory-guided neural networks. RPCANet detects targets with clear interpretability and preserves the intrinsic image feature, instead of directly transforming the detection task into a matrix decomposition problem. Extensive experiments substantiate the effectiveness of our deep unfolding framework and demonstrate its trustworthy results, surpassing baseline methods in both qualitative and quantitative evaluations.

* WACV2024 
Viaarxiv icon

Design and Implementation of A Soccer Ball Detection System with Multiple Cameras

Jan 31, 2023
Lei Li, Tianfang Zhang, Zhongfeng Kang, Wenhan Zhang

Figure 1 for Design and Implementation of A Soccer Ball Detection System with Multiple Cameras
Figure 2 for Design and Implementation of A Soccer Ball Detection System with Multiple Cameras
Figure 3 for Design and Implementation of A Soccer Ball Detection System with Multiple Cameras
Figure 4 for Design and Implementation of A Soccer Ball Detection System with Multiple Cameras

The detection of small and medium-sized objects in three dimensions has always been a frontier exploration problem. This technology has a very wide application in sports analysis, games, virtual reality, human animation and other fields. The traditional three-dimensional small target detection technology has the disadvantages of high cost, low precision and inconvenience, so it is difficult to apply in practice. With the development of machine learning and deep learning, the technology of computer vision algorithms is becoming more mature. Creating an immersive media experience is considered to be a very important research work in sports. The main work is to explore and solve the problem of football detection under the multiple cameras, aiming at the research and implementation of the live broadcast system of football matches. Using multi cameras detects a target ball and determines its position in three dimension with the occlusion, motion, low illumination of the target object. This paper designed and implemented football detection system under multiple cameras for the detection and capture of targets in real-time matches. The main work mainly consists of three parts, football detector, single camera detection, and multi-cameras detection. The system used bundle adjustment to obtain the three-dimensional position of the target, and the GPU to accelerates data pre-processing and achieve accurate real-time capture of the target. By testing the system, it shows that the system can accurately detect and capture the moving targets in 3D. In addition, the solution in this paper is reusable for large-scale competitions, like basketball and soccer. The system framework can be well transplanted into other similar engineering project systems. It has been put into the market.

* 89 pages 
Viaarxiv icon

BuildSeg: A General Framework for the Segmentation of Buildings

Jan 15, 2023
Lei Li, Tianfang Zhang, Stefan Oehmcke, Fabian Gieseke, Christian Igel

Figure 1 for BuildSeg: A General Framework for the Segmentation of Buildings
Figure 2 for BuildSeg: A General Framework for the Segmentation of Buildings

Building segmentation from aerial images and 3D laser scanning (LiDAR) is a challenging task due to the diversity of backgrounds, building textures, and image quality. While current research using different types of convolutional and transformer networks has considerably improved the performance on this task, even more accurate segmentation methods for buildings are desirable for applications such as automatic mapping. In this study, we propose a general framework termed \emph{BuildSeg} employing a generic approach that can be quickly applied to segment buildings. Different data sources were combined to increase generalization performance. The approach yields good results for different data sources as shown by experiments on high-resolution multi-spectral and LiDAR imagery of cities in Norway, Denmark and France. We applied ConvNeXt and SegFormer based models on the high resolution aerial image dataset from the MapAI-competition. The methods achieved an IOU of 0.7902 and a boundary IOU of 0.6185. We used post-processing to account for the rectangular shape of the objects. This increased the boundary IOU from 0.6185 to 0.6189.

Viaarxiv icon

Mask-FPAN: Semi-Supervised Face Parsing in the Wild With De-Occlusion and UV GAN

Dec 18, 2022
Lei Li, Tianfang Zhang, Stefan Oehmcke, Fabian Gieseke, Christian Igel

Figure 1 for Mask-FPAN: Semi-Supervised Face Parsing in the Wild With De-Occlusion and UV GAN
Figure 2 for Mask-FPAN: Semi-Supervised Face Parsing in the Wild With De-Occlusion and UV GAN
Figure 3 for Mask-FPAN: Semi-Supervised Face Parsing in the Wild With De-Occlusion and UV GAN
Figure 4 for Mask-FPAN: Semi-Supervised Face Parsing in the Wild With De-Occlusion and UV GAN

Fine-grained semantic segmentation of a person's face and head, including facial parts and head components, has progressed a great deal in recent years. However, it remains a challenging task, whereby considering ambiguous occlusions and large pose variations are particularly difficult. To overcome these difficulties, we propose a novel framework termed Mask-FPAN. It uses a de-occlusion module that learns to parse occluded faces in a semi-supervised way. In particular, face landmark localization, face occlusionstimations, and detected head poses are taken into account. A 3D morphable face model combined with the UV GAN improves the robustness of 2D face parsing. In addition, we introduce two new datasets named FaceOccMask-HQ and CelebAMaskOcc-HQ for face paring work. The proposed Mask-FPAN framework addresses the face parsing problem in the wild and shows significant performance improvements with MIOU from 0.7353 to 0.9013 compared to the state-of-the-art on challenging face datasets.

* 9 pages 
Viaarxiv icon

LR-CSNet: Low-Rank Deep Unfolding Network for Image Compressive Sensing

Dec 18, 2022
Tianfang Zhang, Lei Li, Christian Igel, Stefan Oehmcke, Fabian Gieseke, Zhenming Peng

Figure 1 for LR-CSNet: Low-Rank Deep Unfolding Network for Image Compressive Sensing
Figure 2 for LR-CSNet: Low-Rank Deep Unfolding Network for Image Compressive Sensing
Figure 3 for LR-CSNet: Low-Rank Deep Unfolding Network for Image Compressive Sensing
Figure 4 for LR-CSNet: Low-Rank Deep Unfolding Network for Image Compressive Sensing

Deep unfolding networks (DUNs) have proven to be a viable approach to compressive sensing (CS). In this work, we propose a DUN called low-rank CS network (LR-CSNet) for natural image CS. Real-world image patches are often well-represented by low-rank approximations. LR-CSNet exploits this property by adding a low-rank prior to the CS optimization task. We derive a corresponding iterative optimization procedure using variable splitting, which is then translated to a new DUN architecture. The architecture uses low-rank generation modules (LRGMs), which learn low-rank matrix factorizations, as well as gradient descent and proximal mappings (GDPMs), which are proposed to extract high-frequency features to refine image details. In addition, the deep features generated at each reconstruction stage in the DUN are transferred between stages to boost the performance. Our extensive experiments on three widely considered datasets demonstrate the promising performance of LR-CSNet compared to state-of-the-art methods in natural image CS.

Viaarxiv icon

RIBAC: Towards Robust and Imperceptible Backdoor Attack against Compact DNN

Aug 22, 2022
Huy Phan, Cong Shi, Yi Xie, Tianfang Zhang, Zhuohang Li, Tianming Zhao, Jian Liu, Yan Wang, Yingying Chen, Bo Yuan

Figure 1 for RIBAC: Towards Robust and Imperceptible Backdoor Attack against Compact DNN

Recently backdoor attack has become an emerging threat to the security of deep neural network (DNN) models. To date, most of the existing studies focus on backdoor attack against the uncompressed model; while the vulnerability of compressed DNNs, which are widely used in the practical applications, is little exploited yet. In this paper, we propose to study and develop Robust and Imperceptible Backdoor Attack against Compact DNN models (RIBAC). By performing systematic analysis and exploration on the important design knobs, we propose a framework that can learn the proper trigger patterns, model parameters and pruning masks in an efficient way. Thereby achieving high trigger stealthiness, high attack success rate and high model efficiency simultaneously. Extensive evaluations across different datasets, including the test against the state-of-the-art defense mechanisms, demonstrate the high robustness, stealthiness and model efficiency of RIBAC. Code is available at https://github.com/huyvnphan/ECCV2022-RIBAC

* European Conference on Computer Vision (ECCV 2022)  
* Code is available at https://github.com/huyvnphan/ECCV2022-RIBAC 
Viaarxiv icon

AGPCNet: Attention-Guided Pyramid Context Networks for Infrared Small Target Detection

Nov 05, 2021
Tianfang Zhang, Siying Cao, Tian Pu, Zhenming Peng

Figure 1 for AGPCNet: Attention-Guided Pyramid Context Networks for Infrared Small Target Detection
Figure 2 for AGPCNet: Attention-Guided Pyramid Context Networks for Infrared Small Target Detection
Figure 3 for AGPCNet: Attention-Guided Pyramid Context Networks for Infrared Small Target Detection
Figure 4 for AGPCNet: Attention-Guided Pyramid Context Networks for Infrared Small Target Detection

Infrared small target detection is an important problem in many fields such as earth observation, military reconnaissance, disaster relief, and has received widespread attention recently. This paper presents the Attention-Guided Pyramid Context Network (AGPCNet) algorithm. Its main components are an Attention-Guided Context Block (AGCB), a Context Pyramid Module (CPM), and an Asymmetric Fusion Module (AFM). AGCB divides the feature map into patches to compute local associations and uses Global Context Attention (GCA) to compute global associations between semantics, CPM integrates features from multi-scale AGCBs, and AFM integrates low-level and deep-level semantics from a feature-fusion perspective to enhance the utilization of features. The experimental results illustrate that AGPCNet has achieved new state-of-the-art performance on two available infrared small target datasets. The source codes are available at https://github.com/Tianfang-Zhang/AGPCNet.

* 12 pages, 13 figures, 8 tables 
Viaarxiv icon

Simultaneous Monitoring of Multiple People's Vital Sign Leveraging a Single Phased-MIMO Radar

Oct 15, 2021
Zhaoyi Xu, Cong Shi, Tianfang Zhang, Shuping Li, Yichao Yuan, Chung-Tse Michael Wu, Yingying Chen, Athina Petropulu

Figure 1 for Simultaneous Monitoring of Multiple People's Vital Sign Leveraging a Single Phased-MIMO Radar
Figure 2 for Simultaneous Monitoring of Multiple People's Vital Sign Leveraging a Single Phased-MIMO Radar
Figure 3 for Simultaneous Monitoring of Multiple People's Vital Sign Leveraging a Single Phased-MIMO Radar
Figure 4 for Simultaneous Monitoring of Multiple People's Vital Sign Leveraging a Single Phased-MIMO Radar

Vital sign monitoring plays a critical role in tracking the physiological state of people and enabling various health-related applications (e.g., recommending a change of lifestyle, examining the risk of diseases). Traditional approaches rely on hospitalization or body-attached instruments, which are costly and intrusive. Therefore, researchers have been exploring contact-less vital sign monitoring with radio frequency signals in recent years. Early studies with continuous wave radars/WiFi devices work on detecting vital signs of a single individual, but it still remains challenging to simultaneously monitor vital signs of multiple subjects, especially those who locate in proximity. In this paper, we design and implement a time-division multiplexing (TDM) phased-MIMO radar sensing scheme for high-precision vital sign monitoring of multiple people. Our phased-MIMO radar can steer the mmWave beam towards different directions with a micro-second delay, which enables capturing the vital signs of multiple individuals at the same radial distance to the radar. Furthermore, we develop a TDM-MIMO technique to fully utilize all transmitting antenna (TX)-receiving antenna (RX) pairs, thereby significantly boosting the signal-to-noise ratio. Based on the designed TDM phased-MIMO radar, we develop a system to automatically localize multiple human subjects and estimate their vital signs. Extensive evaluations show that under two-subject scenarios, our system can achieve an error of less than 1 beat per minute (BPM) and 3 BPM for breathing rate (BR) and heartbeat rate (HR) estimations, respectively, at a subject-to-radar distance of $1.6~m$. The minimal subject-to-subject angle separation is $40{\deg}$, corresponding to a close distance of $0.5~m$ between two subjects, which outperforms the state-of-the-art.

Viaarxiv icon

Probabilistic feature extraction, dose statistic prediction and dose mimicking for automated radiation therapy treatment planning

Feb 24, 2021
Tianfang Zhang, Rasmus Bokrantz, Jimmy Olsson

Figure 1 for Probabilistic feature extraction, dose statistic prediction and dose mimicking for automated radiation therapy treatment planning
Figure 2 for Probabilistic feature extraction, dose statistic prediction and dose mimicking for automated radiation therapy treatment planning
Figure 3 for Probabilistic feature extraction, dose statistic prediction and dose mimicking for automated radiation therapy treatment planning
Figure 4 for Probabilistic feature extraction, dose statistic prediction and dose mimicking for automated radiation therapy treatment planning

Purpose: We propose a general framework for quantifying predictive uncertainties of dose-related quantities and leveraging this information in a dose mimicking problem in the context of automated radiation therapy treatment planning. Methods: A three-step pipeline, comprising feature extraction, dose statistic prediction and dose mimicking, is employed. In particular, the features are produced by a convolutional variational autoencoder and used as inputs in a previously developed nonparametric Bayesian statistical method, estimating the multivariate predictive distribution of a collection of predefined dose statistics. Specially developed objective functions are then used to construct a dose mimicking problem based on the produced distributions, creating deliverable treatment plans. Results: The numerical experiments are performed using a dataset of 94 retrospective treatment plans of prostate cancer patients. We show that the features extracted by the variational autoencoder captures geometric information of substantial relevance to the dose statistic prediction problem, that the estimated predictive distributions are reasonable and outperforms a benchmark method, and that the deliverable plans agree well with their clinical counterparts. Conclusions: We demonstrate that prediction of dose-related quantities may be extended to include uncertainty estimation and that such probabilistic information may be leveraged in a dose mimicking problem. The treatment plans produced by the proposed pipeline resemble their original counterparts well, illustrating the merits of a holistic approach to automated planning based on probabilistic modeling.

Viaarxiv icon

A similarity-based Bayesian mixture-of-experts model

Dec 03, 2020
Tianfang Zhang, Rasmus Bokrantz, Jimmy Olsson

Figure 1 for A similarity-based Bayesian mixture-of-experts model
Figure 2 for A similarity-based Bayesian mixture-of-experts model
Figure 3 for A similarity-based Bayesian mixture-of-experts model
Figure 4 for A similarity-based Bayesian mixture-of-experts model

We present a new nonparametric mixture-of-experts model for multivariate regression problems, inspired by the probabilistic $k$-nearest neighbors algorithm. Using a conditionally specified model, predictions for out-of-sample inputs are based on similarities to each observed data point, yielding predictive distributions represented by Gaussian mixtures. Posterior inference is performed on the parameters of the mixture components as well as the distance metric using a mean-field variational Bayes algorithm accompanied with a stochastic gradient-based optimization procedure. The proposed method is especially advantageous in settings where inputs are of relatively high dimension in comparison to the data size, where input--output relationships are complex, and where predictive distributions may be skewed or multimodal. Computational studies on two synthetic datasets and one dataset comprising dose statistics of radiation therapy treatment plans show that our mixture-of-experts method outperforms a Gaussian process benchmark model both in terms of validation metrics and visual inspection.

Viaarxiv icon