Alert button
Picture for Dejing Dou

Dejing Dou

Alert button

ColdNAS: Search to Modulate for User Cold-Start Recommendation

Jun 06, 2023
Shiguang Wu, Yaqing Wang, Qinghe Jing, Daxiang Dong, Dejing Dou, Quanming Yao

Figure 1 for ColdNAS: Search to Modulate for User Cold-Start Recommendation
Figure 2 for ColdNAS: Search to Modulate for User Cold-Start Recommendation
Figure 3 for ColdNAS: Search to Modulate for User Cold-Start Recommendation
Figure 4 for ColdNAS: Search to Modulate for User Cold-Start Recommendation

Making personalized recommendation for cold-start users, who only have a few interaction histories, is a challenging problem in recommendation systems. Recent works leverage hypernetworks to directly map user interaction histories to user-specific parameters, which are then used to modulate predictor by feature-wise linear modulation function. These works obtain the state-of-the-art performance. However, the physical meaning of scaling and shifting in recommendation data is unclear. Instead of using a fixed modulation function and deciding modulation position by expertise, we propose a modulation framework called ColdNAS for user cold-start problem, where we look for proper modulation structure, including function and position, via neural architecture search. We design a search space which covers broad models and theoretically prove that this search space can be transformed to a much smaller space, enabling an efficient and robust one-shot search algorithm. Extensive experimental results on benchmark datasets show that ColdNAS consistently performs the best. We observe that different modulation functions lead to the best performance on different datasets, which validates the necessity of designing a searching-based method.

Viaarxiv icon

Robust Cross-Modal Knowledge Distillation for Unconstrained Videos

Apr 27, 2023
Wenke Xia, Xingjian Li, Andong Deng, Haoyi Xiong, Dejing Dou, Di Hu

Figure 1 for Robust Cross-Modal Knowledge Distillation for Unconstrained Videos
Figure 2 for Robust Cross-Modal Knowledge Distillation for Unconstrained Videos
Figure 3 for Robust Cross-Modal Knowledge Distillation for Unconstrained Videos
Figure 4 for Robust Cross-Modal Knowledge Distillation for Unconstrained Videos

Cross-modal distillation has been widely used to transfer knowledge across different modalities, enriching the representation of the target unimodal one. Recent studies highly relate the temporal synchronization between vision and sound to the semantic consistency for cross-modal distillation. However, such semantic consistency from the synchronization is hard to guarantee in unconstrained videos, due to the irrelevant modality noise and differentiated semantic correlation. To this end, we first propose a \textit{Modality Noise Filter} (MNF) module to erase the irrelevant noise in teacher modality with cross-modal context. After this purification, we then design a \textit{Contrastive Semantic Calibration} (CSC) module to adaptively distill useful knowledge for target modality, by referring to the differentiated sample-wise semantic correlation in a contrastive fashion. Extensive experiments show that our method could bring a performance boost compared with other distillation methods in both visual action recognition and video retrieval task. We also extend to the audio tagging task to prove the generalization of our method. The source code is available at \href{https://github.com/GeWu-Lab/cross-modal-distillation}{https://github.com/GeWu-Lab/cross-modal-distillation}.

Viaarxiv icon

Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising

Apr 03, 2023
Miaoyu Li, Ji Liu, Ying Fu, Yulun Zhang, Dejing Dou

Figure 1 for Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising
Figure 2 for Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising
Figure 3 for Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising
Figure 4 for Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising

Denoising is a crucial step for hyperspectral image (HSI) applications. Though witnessing the great power of deep learning, existing HSI denoising methods suffer from limitations in capturing the non-local self-similarity. Transformers have shown potential in capturing long-range dependencies, but few attempts have been made with specifically designed Transformer to model the spatial and spectral correlation in HSIs. In this paper, we address these issues by proposing a spectral enhanced rectangle Transformer, driving it to explore the non-local spatial similarity and global spectral low-rank property of HSIs. For the former, we exploit the rectangle self-attention horizontally and vertically to capture the non-local similarity in the spatial domain. For the latter, we design a spectral enhancement module that is capable of extracting global underlying low-rank property of spatial-spectral cubes to suppress noise, while enabling the interactions among non-overlapping spatial rectangles. Extensive experiments have been conducted on both synthetic noisy HSIs and real noisy HSIs, showing the effectiveness of our proposed method in terms of both objective metric and subjective visual quality. The code is available at https://github.com/MyuLi/SERT.

Viaarxiv icon

Doubly Stochastic Models: Learning with Unbiased Label Noises and Inference Stability

Apr 01, 2023
Haoyi Xiong, Xuhong Li, Boyang Yu, Zhanxing Zhu, Dongrui Wu, Dejing Dou

Figure 1 for Doubly Stochastic Models: Learning with Unbiased Label Noises and Inference Stability
Figure 2 for Doubly Stochastic Models: Learning with Unbiased Label Noises and Inference Stability
Figure 3 for Doubly Stochastic Models: Learning with Unbiased Label Noises and Inference Stability
Figure 4 for Doubly Stochastic Models: Learning with Unbiased Label Noises and Inference Stability

Random label noises (or observational noises) widely exist in practical machine learning settings. While previous studies primarily focus on the affects of label noises to the performance of learning, our work intends to investigate the implicit regularization effects of the label noises, under mini-batch sampling settings of stochastic gradient descent (SGD), with assumptions that label noises are unbiased. Specifically, we analyze the learning dynamics of SGD over the quadratic loss with unbiased label noises, where we model the dynamics of SGD as a stochastic differentiable equation (SDE) with two diffusion terms (namely a Doubly Stochastic Model). While the first diffusion term is caused by mini-batch sampling over the (label-noiseless) loss gradients as many other works on SGD, our model investigates the second noise term of SGD dynamics, which is caused by mini-batch sampling over the label noises, as an implicit regularizer. Our theoretical analysis finds such implicit regularizer would favor some convergence points that could stabilize model outputs against perturbation of parameters (namely inference stability). Though similar phenomenon have been investigated, our work doesn't assume SGD as an Ornstein-Uhlenbeck like process and achieve a more generalizable result with convergence of approximation proved. To validate our analysis, we design two sets of empirical studies to analyze the implicit regularizer of SGD with unbiased random label noises for deep neural networks training and linear regression.

* The complete manuscript of our previous submission to ICLR'21 (https://openreview.net/forum?id=g4szfsQUdy3). This manuscript was major done in 2021. We gave try to some venues but unfortunately haven't made it accepted yet 
Viaarxiv icon

Video4MRI: An Empirical Study on Brain Magnetic Resonance Image Analytics with CNN-based Video Classification Frameworks

Feb 24, 2023
Yuxuan Zhang, Qingzhong Wang, Jiang Bian, Yi Liu, Yanwu Xu, Dejing Dou, Haoyi Xiong

Figure 1 for Video4MRI: An Empirical Study on Brain Magnetic Resonance Image Analytics with CNN-based Video Classification Frameworks
Figure 2 for Video4MRI: An Empirical Study on Brain Magnetic Resonance Image Analytics with CNN-based Video Classification Frameworks
Figure 3 for Video4MRI: An Empirical Study on Brain Magnetic Resonance Image Analytics with CNN-based Video Classification Frameworks
Figure 4 for Video4MRI: An Empirical Study on Brain Magnetic Resonance Image Analytics with CNN-based Video Classification Frameworks

To address the problem of medical image recognition, computer vision techniques like convolutional neural networks (CNN) are frequently used. Recently, 3D CNN-based models dominate the field of magnetic resonance image (MRI) analytics. Due to the high similarity between MRI data and videos, we conduct extensive empirical studies on video recognition techniques for MRI classification to answer the questions: (1) can we directly use video recognition models for MRI classification, (2) which model is more appropriate for MRI, (3) are the common tricks like data augmentation in video recognition still useful for MRI classification? Our work suggests that advanced video techniques benefit MRI classification. In this paper, four datasets of Alzheimer's and Parkinson's disease recognition are utilized in experiments, together with three alternative video recognition models and data augmentation techniques that are frequently applied to video tasks. In terms of efficiency, the results reveal that the video framework performs better than 3D-CNN models by 5% - 11% with 50% - 66% less trainable parameters. This report pushes forward the potential fusion of 3D medical imaging and video understanding research.

* Accepted by IEEE ISBI'23 
Viaarxiv icon

Generative Time Series Forecasting with Diffusion, Denoise, and Disentanglement

Jan 08, 2023
Yan Li, Xinjiang Lu, Yaqing Wang, Dejing Dou

Figure 1 for Generative Time Series Forecasting with Diffusion, Denoise, and Disentanglement
Figure 2 for Generative Time Series Forecasting with Diffusion, Denoise, and Disentanglement
Figure 3 for Generative Time Series Forecasting with Diffusion, Denoise, and Disentanglement
Figure 4 for Generative Time Series Forecasting with Diffusion, Denoise, and Disentanglement

Time series forecasting has been a widely explored task of great importance in many applications. However, it is common that real-world time series data are recorded in a short time period, which results in a big gap between the deep model and the limited and noisy time series. In this work, we propose to address the time series forecasting problem with generative modeling and propose a bidirectional variational auto-encoder (BVAE) equipped with diffusion, denoise, and disentanglement, namely D3VAE. Specifically, a coupled diffusion probabilistic model is proposed to augment the time series data without increasing the aleatoric uncertainty and implement a more tractable inference process with BVAE. To ensure the generated series move toward the true target, we further propose to adapt and integrate the multiscale denoising score matching into the diffusion process for time series forecasting. In addition, to enhance the interpretability and stability of the prediction, we treat the latent variable in a multivariate manner and disentangle them on top of minimizing total correlation. Extensive experiments on synthetic and real-world data show that D3VAE outperforms competitive algorithms with remarkable margins. Our implementation is available at https://github.com/PaddlePaddle/PaddleSpatial/tree/main/research/D3VAE.

Viaarxiv icon

Towards Table-to-Text Generation with Pretrained Language Model: A Table Structure Understanding and Text Deliberating Approach

Jan 05, 2023
Miao Chen, Xinjiang Lu, Tong Xu, Yanyan Li, Jingbo Zhou, Dejing Dou, Hui Xiong

Figure 1 for Towards Table-to-Text Generation with Pretrained Language Model: A Table Structure Understanding and Text Deliberating Approach
Figure 2 for Towards Table-to-Text Generation with Pretrained Language Model: A Table Structure Understanding and Text Deliberating Approach
Figure 3 for Towards Table-to-Text Generation with Pretrained Language Model: A Table Structure Understanding and Text Deliberating Approach
Figure 4 for Towards Table-to-Text Generation with Pretrained Language Model: A Table Structure Understanding and Text Deliberating Approach

Although remarkable progress on the neural table-to-text methods has been made, the generalization issues hinder the applicability of these models due to the limited source tables. Large-scale pretrained language models sound like a promising solution to tackle such issues. However, how to effectively bridge the gap between the structured table and the text input by fully leveraging table information to fuel the pretrained model is still not well explored. Besides, another challenge of integrating the deliberation mechanism into the text-to-text pretrained model for solving the table-to-text task remains seldom studied. In this paper, to implement the table-to-text generation with pretrained language model, we propose a table structure understanding and text deliberating approach, namely TASD. Specifically, we devise a three-layered multi-head attention network to realize the table-structure-aware text generation model with the help of the pretrained language model. Furthermore, a multi-pass decoder framework is adopted to enhance the capability of polishing generated text for table descriptions. The empirical studies, as well as human evaluation, on two public datasets, validate that our approach can generate faithful and fluent descriptive texts for different types of tables.

Viaarxiv icon

Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution

Jan 05, 2023
Yan Li, Xinjiang Lu, Haoyi Xiong, Jian Tang, Jiantao Su, Bo Jin, Dejing Dou

Figure 1 for Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution
Figure 2 for Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution
Figure 3 for Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution
Figure 4 for Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution

Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning. Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism. Though one could lower the complexity of Transformers by inducing the sparsity in point-wise self-attentions for LTTF, the limited information utilization prohibits the model from exploring the complex dependencies comprehensively. To this end, we propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects: (i) an encoder-decoder architecture incorporating a linear complexity without sacrificing information utilization is proposed on top of sliding-window attention and Stationary and Instant Recurrent Network (SIRN); (ii) a module derived from the normalizing flow is devised to further improve the information utilization by inferring the outputs with the latent variables in SIRN directly; (iii) the inter-series correlation and temporal dynamics in time-series data are modeled explicitly to fuel the downstream self-attention mechanism. Extensive experiments on seven real-world datasets demonstrate that Conformer outperforms the state-of-the-art methods on LTTF and generates reliable prediction results with uncertainty quantification.

Viaarxiv icon

Temporal Output Discrepancy for Loss Estimation-based Active Learning

Dec 20, 2022
Siyu Huang, Tianyang Wang, Haoyi Xiong, Bihan Wen, Jun Huan, Dejing Dou

Figure 1 for Temporal Output Discrepancy for Loss Estimation-based Active Learning
Figure 2 for Temporal Output Discrepancy for Loss Estimation-based Active Learning
Figure 3 for Temporal Output Discrepancy for Loss Estimation-based Active Learning
Figure 4 for Temporal Output Discrepancy for Loss Estimation-based Active Learning

While deep learning succeeds in a wide range of tasks, it highly depends on the massive collection of annotated data which is expensive and time-consuming. To lower the cost of data annotation, active learning has been proposed to interactively query an oracle to annotate a small proportion of informative samples in an unlabeled dataset. Inspired by the fact that the samples with higher loss are usually more informative to the model than the samples with lower loss, in this paper we present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss. The core of our approach is a measurement Temporal Output Discrepancy (TOD) that estimates the sample loss by evaluating the discrepancy of outputs given by models at different optimization steps. Our theoretical investigation shows that TOD lower-bounds the accumulated sample loss thus it can be used to select informative unlabeled samples. On basis of TOD, we further develop an effective unlabeled data sampling strategy as well as an unsupervised learning criterion for active learning. Due to the simplicity of TOD, our methods are efficient, flexible, and task-agnostic. Extensive experimental results demonstrate that our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks. In addition, we show that TOD can be utilized to select the best model of potentially the highest testing accuracy from a pool of candidate models.

* Accepted for IEEE Transactions on Neural Networks and Learning Systems, 2022. Journal extension of ICCV 2021 [arXiv:2107.14153] 
Viaarxiv icon