Nucleus image segmentation is a crucial step in the analysis, pathological diagnosis, and classification, which heavily relies on the quality of nucleus segmentation. However, the complexity of issues such as variations in nucleus size, blurred nucleus contours, uneven staining, cell clustering, and overlapping cells poses significant challenges. Current methods for nucleus segmentation primarily rely on nuclear morphology or contour-based approaches. Nuclear morphology-based methods exhibit limited generalization ability and struggle to effectively predict irregular-shaped nuclei, while contour-based extraction methods face challenges in accurately segmenting overlapping nuclei. To address the aforementioned issues, we propose a dual-branch network using hybrid attention based residual U-blocks for nucleus instance segmentation. The network simultaneously predicts target information and target contours. Additionally, we introduce a post-processing method that combines the target information and target contours to distinguish overlapping nuclei and generate an instance segmentation image. Within the network, we propose a context fusion block (CF-block) that effectively extracts and merges contextual information from the network. Extensive quantitative evaluations are conducted to assess the performance of our method. Experimental results demonstrate the superior performance of the proposed method compared to state-of-the-art approaches on the BNS, MoNuSeg, CoNSeg, and CPM-17 datasets.
Cervical cancer is one of the most severe diseases threatening women's health. Early detection and diagnosis can significantly reduce cancer risk, in which cervical cytology classification is indispensable. Researchers have recently designed many networks for automated cervical cancer diagnosis, but the limited accuracy and bulky size of these individual models cannot meet practical application needs. To address this issue, we propose a Voting-Stacking ensemble strategy, which employs three Inception networks as base learners and integrates their outputs through a voting ensemble. The samples misclassified by the ensemble model generate a new training set on which a linear classification model is trained as the meta-learner and performs the final predictions. In addition, a multi-level Stacking ensemble framework is designed to improve performance further. The method is evaluated on the SIPakMed, Herlev, and Mendeley datasets, achieving accuracies of 100%, 100%, and 100%, respectively. The experimental results outperform the current state-of-the-art (SOTA) methods, demonstrating its potential for reducing screening workload and helping pathologists detect cervical cancer.
In recent years, channel attention mechanism is widely investigated for its great potential in improving the performance of deep convolutional neural networks (CNNs). However, in most existing methods, only the output of the adjacent convolution layer is fed to the attention layer for calculating the channel weights. Information from other convolution layers is ignored. With these observations, a simple strategy, named Bridge Attention Net (BA-Net), is proposed for better channel attention mechanisms. The main idea of this design is to bridge the outputs of the previous convolution layers through skip connections for channel weights generation. BA-Net can not only provide richer features to calculate channel weight when feedforward, but also multiply paths of parameters updating when backforward. Comprehensive evaluation demonstrates that the proposed approach achieves state-of-the-art performance compared with the existing methods in regards to accuracy and speed. Bridge Attention provides a fresh perspective on the design of neural network architectures and shows great potential in improving the performance of the existing channel attention mechanisms. The code is available at \url{https://github.com/zhaoy376/Attention-mechanism
Point clouds data, as one kind of representation of 3D objects, are the most primitive output obtained by 3D sensors. Unlike 2D images, point clouds are disordered and unstructured. Hence it is not straightforward to apply classification techniques such as the convolution neural network to point clouds analysis directly. To solve this problem, we propose a novel network structure, named Attention-based Graph Convolution Networks (AGCN), to extract point clouds features. Taking the learning process as a message propagation between adjacent points, we introduce an attention mechanism to AGCN for analyzing the relationships between local features of the points. In addition, we introduce an additional global graph structure network to compensate for the relative information of the individual points in the graph structure network. The proposed network is also extended to an encoder-decoder structure for segmentation tasks. Experimental results show that the proposed network can achieve state-of-the-art performance in both classification and segmentation tasks.
Multiple Camera Systems (MCS) have been widely used in many vision applications and attracted much attention recently. There are two principle types of MCS, one is the Rigid Multiple Camera System (RMCS); the other is the Articulated Camera System (ACS). In a RMCS, the relative poses (relative 3-D position and orientation) between the cameras are invariant. While, in an ACS, the cameras are articulated through movable joints, the relative pose between them may change. Therefore, through calibration of an ACS we want to find not only the relative poses between the cameras but also the positions of the joints in the ACS. In this paper, we developed calibration algorithms for the ACS using a simple constraint: the joint is fixed relative to the cameras connected with it during the transformations of the ACS. When the transformations of the cameras in an ACS can be estimated relative to the same coordinate system, the positions of the joints in the ACS can be calculated by solving linear equations. However, in a non-overlapping view ACS, only the ego-transformations of the cameras and can be estimated. We proposed a two-steps method to deal with this problem. In both methods, the ACS is assumed to have performed general transformations in a static environment. The efficiency and robustness of the proposed methods are tested by simulation and real experiments. In the real experiment, the intrinsic and extrinsic parameters of the ACS are obtained simultaneously by our calibration procedure using the same image sequences, no extra data capturing step is required. The corresponding trajectory is recovered and illustrated using the calibration results of the ACS. Since the estimated translations of different cameras in an ACS may scaled by different scale factors, a scale factor estimation algorithm is also proposed. To our knowledge, we are the first to study the calibration of ACS.
In this article, a video base early fire alarm system is developed by monitoring the smoke in the scene. There are two major contributions in this work. First, to find the best texture feature for smoke detection, a general framework, named Histograms of Equivalent Patterns (HEP), is adopted to achieve an extensive evaluation of various kinds of texture features. Second, the \emph{Block based Inter-Frame Difference} (BIFD) and a improved version of LBP-TOP are proposed and ensembled to describe the space-time characteristics of the smoke. In order to reduce the false alarms, the Smoke History Image (SHI) is utilized to register the recent classification results of candidate smoke blocks. Experimental results using SVM show that the proposed method can achieve better accuracy and less false alarm compared with the state-of-the-art technologies.
Many efforts have been devoted to develop alternative methods to traditional vector quantization in image domain such as sparse coding and soft-assignment. These approaches can be split into a dictionary learning phase and a feature encoding phase which are often closely connected. In this paper, we investigate the effects of these phases by separating them for video-based action classification. We compare several dictionary learning methods and feature encoding schemes through extensive experiments on KTH and HMDB51 datasets. Experimental results indicate that sparse coding performs consistently better than the other encoding methods in large complex dataset (i.e., HMDB51), and it is robust to different dictionaries. For small simple dataset (i.e., KTH) with less variation, however, all the encoding strategies perform competitively. In addition, we note that the strength of sophisticated encoding approaches comes not from their corresponding dictionaries but the encoding mechanisms, and we can just use randomly selected exemplars as dictionaries for video-based action classification.