Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tianrui Li

DRAformer: Differentially Reconstructed Attention Transformer for Time-Series Forecasting

Jun 11, 2022
Benhan Li, Shengdong Du, Tianrui Li, Jie Hu, Zhen Jia

Figure 1 for DRAformer: Differentially Reconstructed Attention Transformer for Time-Series Forecasting

Figure 2 for DRAformer: Differentially Reconstructed Attention Transformer for Time-Series Forecasting

Figure 3 for DRAformer: Differentially Reconstructed Attention Transformer for Time-Series Forecasting

Figure 4 for DRAformer: Differentially Reconstructed Attention Transformer for Time-Series Forecasting

Time-series forecasting plays an important role in many real-world scenarios, such as equipment life cycle forecasting, weather forecasting, and traffic flow forecasting. It can be observed from recent research that a variety of transformer-based models have shown remarkable results in time-series forecasting. However, there are still some issues that limit the ability of transformer-based models on time-series forecasting tasks: (i) learning directly on raw data is susceptible to noise due to its complex and unstable feature representation; (ii) the self-attention mechanisms pay insufficient attention to changing features and temporal dependencies. In order to solve these two problems, we propose a transformer-based differentially reconstructed attention model DRAformer. Specifically, DRAformer has the following innovations: (i) learning against differenced sequences, which preserves clear and stable sequence features by differencing and highlights the changing properties of sequences; (ii) the reconstructed attention: integrated distance attention exhibits sequential distance through a learnable Gaussian kernel, distributed difference attention calculates distribution difference by mapping the difference sequence to the adaptive feature space, and the combination of the two effectively focuses on the sequences with prominent associations; (iii) the reconstructed decoder input, which extracts sequence features by integrating variation information and temporal correlations, thereby obtaining a more comprehensive sequence representation. Extensive experiments on four large-scale datasets demonstrate that DRAformer outperforms state-of-the-art baselines.

Via

Access Paper or Ask Questions

Spatio-Temporal Dynamic Graph Relation Learning for Urban Metro Flow Prediction

Apr 06, 2022
Peng Xie, Minbo Ma, Tianrui Li, Shenggong Ji, Shengdong Du, Zeng Yu, Junbo Zhang

Figure 1 for Spatio-Temporal Dynamic Graph Relation Learning for Urban Metro Flow Prediction

Figure 2 for Spatio-Temporal Dynamic Graph Relation Learning for Urban Metro Flow Prediction

Figure 3 for Spatio-Temporal Dynamic Graph Relation Learning for Urban Metro Flow Prediction

Figure 4 for Spatio-Temporal Dynamic Graph Relation Learning for Urban Metro Flow Prediction

Urban metro flow prediction is of great value for metro operation scheduling, passenger flow management and personal travel planning. However, it faces two main challenges. First, different metro stations, e.g. transfer stations and non-transfer stations, have unique traffic patterns. Second, it is challenging to model complex spatio-temporal dynamic relation of metro stations. To address these challenges, we develop a spatio-temporal dynamic graph relational learning model (STDGRL) to predict urban metro station flow. First, we propose a spatio-temporal node embedding representation module to capture the traffic patterns of different stations. Second, we employ a dynamic graph relationship learning module to learn dynamic spatial relationships between metro stations without a predefined graph adjacency matrix. Finally, we provide a transformer-based long-term relationship prediction module for long-term metro flow prediction. Extensive experiments are conducted based on metro data in Beijing, Shanghai, Chongqing and Hangzhou. Experimental results show the advantages of our method beyond 11 baselines for urban metro flow prediction.

Via

Access Paper or Ask Questions

Incremental Unsupervised Feature Selection for Dynamic Incomplete Multi-view Data

Apr 05, 2022
Yanyong Huang, Kejun Guo, Xiuwen Yi, Zhong Li, Tianrui Li

Figure 1 for Incremental Unsupervised Feature Selection for Dynamic Incomplete Multi-view Data

Figure 2 for Incremental Unsupervised Feature Selection for Dynamic Incomplete Multi-view Data

Figure 3 for Incremental Unsupervised Feature Selection for Dynamic Incomplete Multi-view Data

Figure 4 for Incremental Unsupervised Feature Selection for Dynamic Incomplete Multi-view Data

Multi-view unsupervised feature selection has been proven to be efficient in reducing the dimensionality of multi-view unlabeled data with high dimensions. The previous methods assume all of the views are complete. However, in real applications, the multi-view data are often incomplete, i.e., some views of instances are missing, which will result in the failure of these methods. Besides, while the data arrive in form of streams, these existing methods will suffer the issues of high storage cost and expensive computation time. To address these issues, we propose an Incremental Incomplete Multi-view Unsupervised Feature Selection method (I$^2$MUFS) on incomplete multi-view streaming data. By jointly considering the consistent and complementary information across different views, I$^2$MUFS embeds the unsupervised feature selection into an extended weighted non-negative matrix factorization model, which can learn a consensus clustering indicator matrix and fuse different latent feature matrices with adaptive view weights. Furthermore, we introduce the incremental leaning mechanisms to develop an alternative iterative algorithm, where the feature selection matrix is incrementally updated, rather than recomputing on the entire updated data from scratch. A series of experiments are conducted to verify the effectiveness of the proposed method by comparing with several state-of-the-art methods. The experimental results demonstrate the effectiveness and efficiency of the proposed method in terms of the clustering metrics and the computational cost.

Via

Access Paper or Ask Questions

Spatio-Temporal Latent Graph Structure Learning for Traffic Forecasting

Feb 25, 2022
Jiabin Tang, Tang Qian, Shijing Liu, Shengdong Du, Jie Hu, Tianrui Li

Figure 1 for Spatio-Temporal Latent Graph Structure Learning for Traffic Forecasting

Figure 2 for Spatio-Temporal Latent Graph Structure Learning for Traffic Forecasting

Figure 3 for Spatio-Temporal Latent Graph Structure Learning for Traffic Forecasting

Figure 4 for Spatio-Temporal Latent Graph Structure Learning for Traffic Forecasting

Accurate traffic forecasting, the foundation of intelligent transportation systems (ITS), has never been more significant than nowadays due to the prosperity of the smart cities and urban computing. Recently, Graph Neural Network truly outperforms the traditional methods. Nevertheless, the most conventional GNN based model works well while given a pre-defined graph structure. And the existing methods of defining the graph structures focus purely on spatial dependencies and ignored the temporal correlation. Besides, the semantics of the static pre-defined graph adjacency applied during the whole training progress is always incomplete, thus overlooking the latent topologies that may fine-tune the model. To tackle these challenges, we proposed a new traffic forecasting framework--Spatio-Temporal Latent Graph Structure Learning networks (ST-LGSL). More specifically, the model employed a graph generator based on Multilayer perceptron and K-Nearest Neighbor, which learns the latent graph topological information from the entire data considering both spatial and temporal dynamics. Furthermore, with the initialization of MLP-kNN based on ground-truth adjacency matrix and similarity metric in kNN, ST-LGSL aggregates the topologies focusing on geography and node similarity. Additionally, the generated graphs act as the input of spatio-temporal prediction module combined with the Diffusion Graph Convolutions and Gated Temporal Convolutions Networks. Experimental results on two benchmarking datasets in real world demonstrate that ST-LGSL outperforms various types of state-of-art baselines.

Via

Access Paper or Ask Questions

A Differential Attention Fusion Model Based on Transformer for Time Series Forecasting

Feb 23, 2022
Benhan Li, Shengdong Du, Tianrui Li

Figure 1 for A Differential Attention Fusion Model Based on Transformer for Time Series Forecasting

Figure 2 for A Differential Attention Fusion Model Based on Transformer for Time Series Forecasting

Figure 3 for A Differential Attention Fusion Model Based on Transformer for Time Series Forecasting

Figure 4 for A Differential Attention Fusion Model Based on Transformer for Time Series Forecasting

Time series forecasting is widely used in the fields of equipment life cycle forecasting, weather forecasting, traffic flow forecasting, and other fields. Recently, some scholars have tried to apply Transformer to time series forecasting because of its powerful parallel training ability. However, the existing Transformer methods do not pay enough attention to the small time segments that play a decisive role in prediction, making it insensitive to small changes that affect the trend of time series, and it is difficult to effectively learn continuous time-dependent features. To solve this problem, we propose a differential attention fusion model based on Transformer, which designs the differential layer, neighbor attention, sliding fusion mechanism, and residual layer on the basis of classical Transformer architecture. Specifically, the differences of adjacent time points are extracted and focused by difference and neighbor attention. The sliding fusion mechanism fuses various features of each time point so that the data can participate in encoding and decoding without losing important information. The residual layer including convolution and LSTM further learns the dependence between time points and enables our model to carry out deeper training. A large number of experiments on three datasets show that the prediction results produced by our method are favorably comparable to the state-of-the-art.

Via

Access Paper or Ask Questions

HiSTGNN: Hierarchical Spatio-temporal Graph Neural Networks for Weather Forecasting

Jan 22, 2022
Minbo Ma, Peng Xie, Fei Teng, Tianrui Li, Bin Wang, Shenggong Ji, Junbo Zhang

Figure 1 for HiSTGNN: Hierarchical Spatio-temporal Graph Neural Networks for Weather Forecasting

Figure 2 for HiSTGNN: Hierarchical Spatio-temporal Graph Neural Networks for Weather Forecasting

Figure 3 for HiSTGNN: Hierarchical Spatio-temporal Graph Neural Networks for Weather Forecasting

Figure 4 for HiSTGNN: Hierarchical Spatio-temporal Graph Neural Networks for Weather Forecasting

Weather Forecasting is an attractive challengeable task due to its influence on human life and complexity in atmospheric motion. Supported by massive historical observed time series data, the task is suitable for data-driven approaches, especially deep neural networks. Recently, the Graph Neural Networks (GNNs) based methods have achieved excellent performance for spatio-temporal forecasting. However, the canonical GNNs-based methods only individually model the local graph of meteorological variables per station or the global graph of whole stations, lacking information interaction between meteorological variables in different stations. In this paper, we propose a novel Hierarchical Spatio-Temporal Graph Neural Network (HiSTGNN) to model cross-regional spatio-temporal correlations among meteorological variables in multiple stations. An adaptive graph learning layer and spatial graph convolution are employed to construct self-learning graph and study hidden dependency among nodes of variable-level and station-level graph. For capturing temporal pattern, the dilated inception as the backbone of gate temporal convolution is designed to model long and various meteorological trends. Moreover, a dynamic interaction learning is proposed to build bidirectional information passing in hierarchical graph. Experimental results on three real-world meteorological datasets demonstrate the superior performance of HiSTGNN beyond 7 baselines and it reduces the errors by 4.2% to 11.6% especially compared to state-of-the-art weather forecasting method.

Via

Access Paper or Ask Questions

Unsupervised feature selection via self-paced learning and low-redundant regularization

Dec 14, 2021
Weiyi Li, Hongmei Chen, Tianrui Li, Jihong Wan, Binbin Sang

Figure 1 for Unsupervised feature selection via self-paced learning and low-redundant regularization

Figure 2 for Unsupervised feature selection via self-paced learning and low-redundant regularization

Figure 3 for Unsupervised feature selection via self-paced learning and low-redundant regularization

Figure 4 for Unsupervised feature selection via self-paced learning and low-redundant regularization

Much more attention has been paid to unsupervised feature selection nowadays due to the emergence of massive unlabeled data. The distribution of samples and the latent effect of training a learning method using samples in more effective order need to be considered so as to improve the robustness of the method. Self-paced learning is an effective method considering the training order of samples. In this study, an unsupervised feature selection is proposed by integrating the framework of self-paced learning and subspace learning. Moreover, the local manifold structure is preserved and the redundancy of features is constrained by two regularization terms. $L_{2,1/2}$-norm is applied to the projection matrix, which aims to retain discriminative features and further alleviate the effect of noise in the data. Then, an iterative method is presented to solve the optimization problem. The convergence of the method is proved theoretically and experimentally. The proposed method is compared with other state of the art algorithms on nine real-world datasets. The experimental results show that the proposed method can improve the performance of clustering methods and outperform other compared algorithms.

Via

Access Paper or Ask Questions

ScaleVLAD: Improving Multimodal Sentiment Analysis via Multi-Scale Fusion of Locally Descriptors

Dec 02, 2021
Huaishao Luo, Lei Ji, Yanyong Huang, Bin Wang, Shenggong Ji, Tianrui Li

Figure 1 for ScaleVLAD: Improving Multimodal Sentiment Analysis via Multi-Scale Fusion of Locally Descriptors

Figure 2 for ScaleVLAD: Improving Multimodal Sentiment Analysis via Multi-Scale Fusion of Locally Descriptors

Figure 3 for ScaleVLAD: Improving Multimodal Sentiment Analysis via Multi-Scale Fusion of Locally Descriptors

Figure 4 for ScaleVLAD: Improving Multimodal Sentiment Analysis via Multi-Scale Fusion of Locally Descriptors

Fusion technique is a key research topic in multimodal sentiment analysis. The recent attention-based fusion demonstrates advances over simple operation-based fusion. However, these fusion works adopt single-scale, i.e., token-level or utterance-level, unimodal representation. Such single-scale fusion is suboptimal because that different modality should be aligned with different granularities. This paper proposes a fusion model named ScaleVLAD to gather multi-Scale representation from text, video, and audio with shared Vectors of Locally Aggregated Descriptors to improve unaligned multimodal sentiment analysis. These shared vectors can be regarded as shared topics to align different modalities. In addition, we propose a self-supervised shifted clustering loss to keep the fused feature differentiation among samples. The backbones are three Transformer encoders corresponding to three modalities, and the aggregated features generated from the fusion module are feed to a Transformer plus a full connection to finish task predictions. Experiments on three popular sentiment analysis benchmarks, IEMOCAP, MOSI, and MOSEI, demonstrate significant gains over baselines.

Via

Access Paper or Ask Questions

CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval

May 08, 2021
Huaishao Luo, Lei Ji, Ming Zhong, Yang Chen, Wen Lei, Nan Duan, Tianrui Li

Figure 1 for CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval

Figure 2 for CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval

Figure 3 for CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval

Figure 4 for CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval

Video-text retrieval plays an essential role in multi-modal research and has been widely used in many real-world web applications. The CLIP (Contrastive Language-Image Pre-training), an image-language pre-training model, has demonstrated the power of visual concepts learning from web collected image-text datasets. In this paper, we propose a CLIP4Clip model to transfer the knowledge of the CLIP model to video-language retrieval in an end-to-end manner. Several questions are investigated via empirical studies: 1) Whether image feature is enough for video-text retrieval? 2) How a post-pretraining on a large-scale video-text dataset based on the CLIP affect the performance? 3) What is the practical mechanism to model temporal dependency between video frames? And 4) The Hyper-parameters sensitivity of the model on video-text retrieval task. Extensive experimental results present that the CLIP4Clip model transferred from the CLIP can achieve SOTA results on various video-text retrieval datasets, including MSR-VTT, MSVC, LSMDC, ActivityNet, and DiDeMo. We release our code at https://github.com/ArrowLuo/CLIP4Clip.

Via

Access Paper or Ask Questions