Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xin Chen

Efficient Visual Tracking via Hierarchical Cross-Attention Transformer

Mar 25, 2022
Xin Chen, Dong Wang, Dongdong Li, Huchuan Lu

Figure 1 for Efficient Visual Tracking via Hierarchical Cross-Attention Transformer

Figure 2 for Efficient Visual Tracking via Hierarchical Cross-Attention Transformer

Figure 3 for Efficient Visual Tracking via Hierarchical Cross-Attention Transformer

Figure 4 for Efficient Visual Tracking via Hierarchical Cross-Attention Transformer

In recent years, target tracking has made great progress in accuracy. This development is mainly attributed to powerful networks (such as transformers) and additional modules (such as online update and refinement modules). However, less attention has been paid to tracking speed. Most state-of-the-art trackers are satisfied with the real-time speed on powerful GPUs. However, practical applications necessitate higher requirements for tracking speed, especially when edge platforms with limited resources are used. In this work, we present an efficient tracking method via a hierarchical cross-attention transformer named HCAT. Our model runs about 195 fps on GPU, 45 fps on CPU, and 55 fps on the edge AI platform of NVidia Jetson AGX Xavier. Experiments show that our HCAT achieves promising results on LaSOT, GOT-10k, TrackingNet, NFS, OTB100, UAV123, and VOT2020. Code and models are available at https://github.com/chenxin-dlut/HCAT.

Via

Access Paper or Ask Questions

High-Performance Transformer Tracking

Mar 25, 2022
Xin Chen, Bin Yan, Jiawen Zhu, Dong Wang, Huchuan Lu

Figure 1 for High-Performance Transformer Tracking

Figure 2 for High-Performance Transformer Tracking

Figure 3 for High-Performance Transformer Tracking

Figure 4 for High-Performance Transformer Tracking

Correlation has a critical role in the tracking field, especially in recent popular Siamese-based trackers. The correlation operation is a simple fusion manner to consider the similarity between the template and the search region. However, the correlation operation is a local linear matching process, losing semantic information and falling into local optimum easily, which may be the bottleneck of designing high-accuracy tracking algorithms. In this work, to determine whether a better feature fusion method exists than correlation, a novel attention-based feature fusion network, inspired by Transformer, is presented. This network effectively combines the template and the search region features using attention. Specifically, the proposed method includes an ego-context augment module based on self-attention and a cross-feature augment module based on cross-attention. First, we present a Transformer tracking (named TransT) method based on the Siamese-like feature extraction backbone, the designed attention-based fusion mechanism, and the classification and regression head. Based on the TransT baseline, we further design a segmentation branch to generate an accurate mask. Finally, we propose a stronger version of TransT by extending TransT with a multi-template design and an IoU prediction head, named TransT-M. Experiments show that our TransT and TransT-M methods achieve promising results on seven popular datasets. Code and models are available at https://github.com/chenxin-dlut/TransT-M.

* arXiv admin note: substantial text overlap with arXiv:2103.15436

Via

Access Paper or Ask Questions

Multi-view Multi-behavior Contrastive Learning in Recommendation

Mar 20, 2022
Yiqing Wu, Ruobing Xie, Yongchun Zhu, Xiang Ao, Xin Chen, Xu Zhang, Fuzhen Zhuang, Leyu Lin, Qing He

Figure 1 for Multi-view Multi-behavior Contrastive Learning in Recommendation

Figure 2 for Multi-view Multi-behavior Contrastive Learning in Recommendation

Figure 3 for Multi-view Multi-behavior Contrastive Learning in Recommendation

Figure 4 for Multi-view Multi-behavior Contrastive Learning in Recommendation

Multi-behavior recommendation (MBR) aims to jointly consider multiple behaviors to improve the target behavior's performance. We argue that MBR models should: (1) model the coarse-grained commonalities between different behaviors of a user, (2) consider both individual sequence view and global graph view in multi-behavior modeling, and (3) capture the fine-grained differences between multiple behaviors of a user. In this work, we propose a novel Multi-behavior Multi-view Contrastive Learning Recommendation (MMCLR) framework, including three new CL tasks to solve the above challenges, respectively. The multi-behavior CL aims to make different user single-behavior representations of the same user in each view to be similar. The multi-view CL attempts to bridge the gap between a user's sequence-view and graph-view representations. The behavior distinction CL focuses on modeling fine-grained differences of different behaviors. In experiments, we conduct extensive evaluations and ablation tests to verify the effectiveness of MMCLR and various CL tasks on two real-world datasets, achieving SOTA performance over existing baselines. Our code will be available on \url{https://github.com/wyqing20/MMCLR}

* DASFAA 2022 Main Conference Long Paper

Via

Access Paper or Ask Questions

Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation

Mar 12, 2022
Linfeng Zhang, Xin Chen, Xiaobing Tu, Pengfei Wan, Ning Xu, Kaisheng Ma

Figure 1 for Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation

Figure 2 for Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation

Figure 3 for Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation

Figure 4 for Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation

Remarkable achievements have been attained with Generative Adversarial Networks (GANs) in image-to-image translation. However, due to a tremendous amount of parameters, state-of-the-art GANs usually suffer from low efficiency and bulky memory usage. To tackle this challenge, firstly, this paper investigates GANs performance from a frequency perspective. The results show that GANs, especially small GANs lack the ability to generate high-quality high frequency information. To address this problem, we propose a novel knowledge distillation method referred to as wavelet knowledge distillation. Instead of directly distilling the generated images of teachers, wavelet knowledge distillation first decomposes the images into different frequency bands with discrete wavelet transformation and then only distills the high frequency bands. As a result, the student GAN can pay more attention to its learning on high frequency bands. Experiments demonstrate that our method leads to 7.08 times compression and 6.80 times acceleration on CycleGAN with almost no performance drop. Additionally, we have studied the relation between discriminators and generators which shows that the compression of discriminators can promote the performance of compressed generators.

* Accepted by CVPR2022

Via

Access Paper or Ask Questions

RestainNet: a self-supervised digital re-stainer for stain normalization

Feb 28, 2022
Bingchao Zhao, Jiatai Lin, Changhong Liang, Zongjian Yi, Xin Chen, Bingbing Li, Weihao Qiu, Danyi Li, Li Liang, Chu Han, Zaiyi Liu

Figure 1 for RestainNet: a self-supervised digital re-stainer for stain normalization

Figure 2 for RestainNet: a self-supervised digital re-stainer for stain normalization

Figure 3 for RestainNet: a self-supervised digital re-stainer for stain normalization

Figure 4 for RestainNet: a self-supervised digital re-stainer for stain normalization

Color inconsistency is an inevitable challenge in computational pathology, which generally happens because of stain intensity variations or sections scanned by different scanners. It harms the pathological image analysis methods, especially the learning-based models. A series of approaches have been proposed for stain normalization. However, most of them are lack flexibility in practice. In this paper, we formulated stain normalization as a digital re-staining process and proposed a self-supervised learning model, which is called RestainNet. Our network is regarded as a digital restainer which learns how to re-stain an unstained (grayscale) image. Two digital stains, Hematoxylin (H) and Eosin (E) were extracted from the original image by Beer-Lambert's Law. We proposed a staining loss to maintain the correctness of stain intensity during the restaining process. Thanks to the self-supervised nature, paired training samples are no longer necessary, which demonstrates great flexibility in practical usage. Our RestainNet outperforms existing approaches and achieves state-of-the-art performance with regard to color correctness and structure preservation. We further conducted experiments on the segmentation and classification tasks and the proposed RestainNet achieved outstanding performance compared with SOTA methods. The self-supervised design allows the network to learn any staining style with no extra effort.

Via

Access Paper or Ask Questions

Remaining Useful Life Prediction Using Temporal Deep Degradation Network for Complex Machinery with Attention-based Feature Extraction

Feb 21, 2022
Yuwen Qin, Ningbo Cai, Chen Gao, Yadong Zhang, Yonghong Cheng, Xin Chen

Figure 1 for Remaining Useful Life Prediction Using Temporal Deep Degradation Network for Complex Machinery with Attention-based Feature Extraction

Figure 2 for Remaining Useful Life Prediction Using Temporal Deep Degradation Network for Complex Machinery with Attention-based Feature Extraction

Figure 3 for Remaining Useful Life Prediction Using Temporal Deep Degradation Network for Complex Machinery with Attention-based Feature Extraction

Figure 4 for Remaining Useful Life Prediction Using Temporal Deep Degradation Network for Complex Machinery with Attention-based Feature Extraction

The precise estimate of remaining useful life (RUL) is vital for the prognostic analysis and predictive maintenance that can significantly reduce failure rate and maintenance costs. The degradation-related features extracted from the sensor streaming data with neural networks can dramatically improve the accuracy of the RUL prediction. The Temporal deep degradation network (TDDN) model is proposed to make the RUL prediction with the degradation-related features given by the one-dimensional convolutional neural network (1D CNN) feature extraction and attention mechanism. 1D CNN is used to extract the temporal features from the streaming sensor data. Temporal features have monotonic degradation trends from the fluctuating raw sensor streaming data. Attention mechanism can improve the RUL prediction performance by capturing the fault characteristics and the degradation development with the attention weights. The performance of the TDDN model is evaluated on the public C-MAPSS dataset and compared with the existing methods. The results show that the TDDN model can achieve the best RUL prediction accuracy in complex conditions compared to current machine learning models. The degradation-related features extracted from the high-dimension sensor streaming data demonstrate the clear degradation trajectories and degradation stages that enable TDDN to predict the turbofan-engine RUL accurately and efficiently.

Via

Access Paper or Ask Questions

CenGCN: Centralized Convolutional Networks with Vertex Imbalance for Scale-Free Graphs

Feb 16, 2022
Feng Xia, Lei Wang, Tao Tang, Xin Chen, Xiangjie Kong, Giles Oatley, Irwin King

Figure 1 for CenGCN: Centralized Convolutional Networks with Vertex Imbalance for Scale-Free Graphs

Figure 2 for CenGCN: Centralized Convolutional Networks with Vertex Imbalance for Scale-Free Graphs

Figure 3 for CenGCN: Centralized Convolutional Networks with Vertex Imbalance for Scale-Free Graphs

Figure 4 for CenGCN: Centralized Convolutional Networks with Vertex Imbalance for Scale-Free Graphs

Graph Convolutional Networks (GCNs) have achieved impressive performance in a wide variety of areas, attracting considerable attention. The core step of GCNs is the information-passing framework that considers all information from neighbors to the central vertex to be equally important. Such equal importance, however, is inadequate for scale-free networks, where hub vertices propagate more dominant information due to vertex imbalance. In this paper, we propose a novel centrality-based framework named CenGCN to address the inequality of information. This framework first quantifies the similarity between hub vertices and their neighbors by label propagation with hub vertices. Based on this similarity and centrality indices, the framework transforms the graph by increasing or decreasing the weights of edges connecting hub vertices and adding self-connections to vertices. In each non-output layer of the GCN, this framework uses a hub attention mechanism to assign new weights to connected non-hub vertices based on their common information with hub vertices. We present two variants CenGCN\_D and CenGCN\_E, based on degree centrality and eigenvector centrality, respectively. We also conduct comprehensive experiments, including vertex classification, link prediction, vertex clustering, and network visualization. The results demonstrate that the two variants significantly outperform state-of-the-art baselines.

* IEEE Transactions on Knowledge and Data Engineering (2022)
* 16 pages, 8 figures

Via

Access Paper or Ask Questions

Attention-based Deep Neural Networks for Battery Discharge Capacity Forecasting

Feb 14, 2022
Yadong Zhang, Chenye Zou, Xin Chen

Figure 1 for Attention-based Deep Neural Networks for Battery Discharge Capacity Forecasting

Figure 2 for Attention-based Deep Neural Networks for Battery Discharge Capacity Forecasting

Figure 3 for Attention-based Deep Neural Networks for Battery Discharge Capacity Forecasting

Figure 4 for Attention-based Deep Neural Networks for Battery Discharge Capacity Forecasting

Battery discharge capacity forecasting is critically essential for the applications of lithium-ion batteries. The capacity degeneration can be treated as the memory of the initial battery state of charge from the data point of view. The streaming sensor data collected by battery management systems (BMS) reflect the usable battery capacity degradation rates under various operational working conditions. The battery capacity in different cycles can be measured with the temporal patterns extracted from the streaming sensor data based on the attention mechanism. The attention-based similarity regarding the first cycle can describe the battery capacity degradation in the following cycles. The deep degradation network (DDN) is developed with the attention mechanism to measure similarity and predict battery capacity. The DDN model can extract the degeneration-related temporal patterns from the streaming sensor data and perform the battery capacity prediction efficiently online in real-time. Based on the MIT-Stanford open-access battery aging dataset, the root-mean-square error of the capacity estimation is 1.3 mAh. The mean absolute percentage error of the proposed DDN model is 0.06{\%}. The DDN model also performance well in the Oxford Battery Degradation Dataset with dynamic load profiles. Therefore, the high accuracy and strong robustness of the proposed algorithm are verified.

Via

Access Paper or Ask Questions

Fast Transient Stability Prediction Using Grid-informed Temporal and Topological Embedding Deep Neural Network

Jan 23, 2022
Peiyuan Sun, Long Huo, Siyuan Liang, Xin Chen

Figure 1 for Fast Transient Stability Prediction Using Grid-informed Temporal and Topological Embedding Deep Neural Network

Figure 2 for Fast Transient Stability Prediction Using Grid-informed Temporal and Topological Embedding Deep Neural Network

Figure 3 for Fast Transient Stability Prediction Using Grid-informed Temporal and Topological Embedding Deep Neural Network

Figure 4 for Fast Transient Stability Prediction Using Grid-informed Temporal and Topological Embedding Deep Neural Network

Transient stability prediction is critically essential to the fast online assessment and maintaining the stable operation in power systems. The wide deployment of phasor measurement units (PMUs) promotes the development of data-driven approaches for transient stability assessment. This paper proposes the temporal and topological embedding deep neural network (TTEDNN) model to forecast transient stability with the early transient dynamics. The TTEDNN model can accurately and efficiently predict the transient stability by extracting the temporal and topological features from the time-series data of the early transient dynamics. The grid-informed adjacency matrix is used to incorporate the power grid structural and electrical parameter information. The transient dynamics simulation environments under the single-node and multiple-node perturbations are used to test the performance of the TTEDNN model for the IEEE 39-bus and IEEE 118-bus power systems. The results show that the TTEDNN model has the best and most robust prediction performance. Furthermore, the TTEDNN model also demonstrates the transfer capability to predict the transient stability in the more complicated transient dynamics simulation environments.

Via

Access Paper or Ask Questions

GPS: A Policy-driven Sampling Approach for Graph Representation Learning

Jan 19, 2022
Tiehua Zhang, Yuze Liu, Xin Chen, Xiaowei Huang, Feng Zhu, Xi Zheng

Figure 1 for GPS: A Policy-driven Sampling Approach for Graph Representation Learning

Figure 2 for GPS: A Policy-driven Sampling Approach for Graph Representation Learning

Figure 3 for GPS: A Policy-driven Sampling Approach for Graph Representation Learning

Figure 4 for GPS: A Policy-driven Sampling Approach for Graph Representation Learning

Graph representation learning has drawn increasing attention in recent years, especially for learning the low dimensional embedding at both node and graph level for classification and recommendations tasks. To enable learning the representation on the large-scale graph data in the real world, numerous research has focused on developing different sampling strategies to facilitate the training process. Herein, we propose an adaptive Graph Policy-driven Sampling model (GPS), where the influence of each node in the local neighborhood is realized through the adaptive correlation calculation. Specifically, the selections of the neighbors are guided by an adaptive policy algorithm, contributing directly to the message aggregation, node embedding updating, and graph level readout steps. We then conduct comprehensive experiments against baseline methods on graph classification tasks from various perspectives. Our proposed model outperforms the existing ones by 3%-8% on several vital benchmarks, achieving state-of-the-art performance in real-world datasets.

Via

Access Paper or Ask Questions