Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kai Zhang

Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images

Dec 17, 2022
Lei Ding, Jing Zhang, Kai Zhang, Haitao Guo, Bing Liu, Lorenzo Bruzzone

Figure 1 for Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images

Figure 2 for Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images

Figure 3 for Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images

Figure 4 for Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images

Semantic Change Detection (SCD) refers to the task of simultaneously extracting the changed areas and the semantic categories (before and after the changes) in Remote Sensing Images (RSIs). This is more meaningful than Binary Change Detection (BCD) since it enables detailed change analysis in the observed areas. Previous works established triple-branch Convolutional Neural Network (CNN) architectures as the paradigm for SCD. However, it remains challenging to exploit semantic information with a limited amount of change samples. In this work, we investigate to jointly consider the spatio-temporal dependencies to improve the accuracy of SCD. First, we propose a Semantic Change Transformer (SCanFormer) to explicitly model the 'from-to' semantic transitions between the bi-temporal RSIs. Then, we introduce a semantic learning scheme to leverage the spatio-temporal constraints, which are coherent to the SCD task, to guide the learning of semantic changes. The resulting network (SCanNet) significantly outperforms the baseline method in terms of both detection of critical semantic changes and semantic consistency in the obtained bi-temporal results. It achieves the SOTA accuracy on two benchmark datasets for the SCD.

Via

Access Paper or Ask Questions

CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution

Dec 08, 2022
Jiezhang Cao, Qin Wang, Yongqin Xian, Yawei Li, Bingbing Ni, Zhiming Pi, Kai Zhang, Yulun Zhang, Radu Timofte, Luc Van Gool

Figure 1 for CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution

Figure 2 for CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution

Figure 3 for CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution

Figure 4 for CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution

Learning continuous image representations is recently gaining popularity for image super-resolution (SR) because of its ability to reconstruct high-resolution images with arbitrary scales from low-resolution inputs. Existing methods mostly ensemble nearby features to predict the new pixel at any queried coordinate in the SR image. Such a local ensemble suffers from some limitations: i) it has no learnable parameters and it neglects the similarity of the visual features; ii) it has a limited receptive field and cannot ensemble relevant features in a large field which are important in an image; iii) it inherently has a gap with real camera imaging since it only depends on the coordinate. To address these issues, this paper proposes a continuous implicit attention-in-attention network, called CiaoSR. We explicitly design an implicit attention network to learn the ensemble weights for the nearby local features. Furthermore, we embed a scale-aware attention in this implicit attention network to exploit additional non-local information. Extensive experiments on benchmark datasets demonstrate CiaoSR significantly outperforms the existing single image super resolution (SISR) methods with the same backbone. In addition, the proposed method also achieves the state-of-the-art performance on the arbitrary-scale SR task. The effectiveness of the method is also demonstrated on the real-world SR setting. More importantly, CiaoSR can be flexibly integrated into any backbone to improve the SR performance.

Via

Access Paper or Ask Questions

GreenEyes: An Air Quality Evaluating Model based on WaveNet

Dec 08, 2022
Kan Huang, Kai Zhang, Ming Liu

Figure 1 for GreenEyes: An Air Quality Evaluating Model based on WaveNet

Figure 2 for GreenEyes: An Air Quality Evaluating Model based on WaveNet

Figure 3 for GreenEyes: An Air Quality Evaluating Model based on WaveNet

Figure 4 for GreenEyes: An Air Quality Evaluating Model based on WaveNet

Accompanying rapid industrialization, humans are suffering from serious air pollution problems. The demand for air quality prediction is becoming more and more important to the government's policy-making and people's daily life. In this paper, We propose GreenEyes -- a deep neural network model, which consists of a WaveNet-based backbone block for learning representations of sequences and an LSTM with a Temporal Attention module for capturing the hidden interactions between features of multi-channel inputs. To evaluate the effectiveness of our proposed method, we carry out several experiments including an ablation study on our collected and preprocessed air quality data near HKUST. The experimental results show our model can effectively predict the air quality level of the next timestamp given any segment of the air quality data from the data set. We have also released our standalone dataset at https://github.com/AI-Huang/IAQI_Dataset The model and code for this paper are publicly available at https://github.com/AI-Huang/AirEvaluation

Via

Access Paper or Ask Questions

Changes from Classical Statistics to Modern Statistics and Data Science

Oct 30, 2022
Kai Zhang, Shan Liu, Momiao Xiong

A coordinate system is a foundation for every quantitative science, engineering, and medicine. Classical physics and statistics are based on the Cartesian coordinate system. The classical probability and hypothesis testing theory can only be applied to Euclidean data. However, modern data in the real world are from natural language processing, mathematical formulas, social networks, transportation and sensor networks, computer visions, automations, and biomedical measurements. The Euclidean assumption is not appropriate for non Euclidean data. This perspective addresses the urgent need to overcome those fundamental limitations and encourages extensions of classical probability theory and hypothesis testing , diffusion models and stochastic differential equations from Euclidean space to non Euclidean space. Artificial intelligence such as natural language processing, computer vision, graphical neural networks, manifold regression and inference theory, manifold learning, graph neural networks, compositional diffusion models for automatically compositional generations of concepts and demystifying machine learning systems, has been rapidly developed. Differential manifold theory is the mathematic foundations of deep learning and data science as well. We urgently need to shift the paradigm for data analysis from the classical Euclidean data analysis to both Euclidean and non Euclidean data analysis and develop more and more innovative methods for describing, estimating and inferring non Euclidean geometries of modern real datasets. A general framework for integrated analysis of both Euclidean and non Euclidean data, composite AI, decision intelligence and edge AI provide powerful innovative ideas and strategies for fundamentally advancing AI. We are expected to marry statistics with AI, develop a unified theory of modern statistics and drive next generation of AI and data science.

* 37 pages

Via

Access Paper or Ask Questions

BELIEF in Dependence: Leveraging Atomic Linearity in Data Bits for Rethinking Generalized Linear Models

Oct 19, 2022
Benjamin Brown, Kai Zhang, Xiao-Li Meng

Figure 1 for BELIEF in Dependence: Leveraging Atomic Linearity in Data Bits for Rethinking Generalized Linear Models

Figure 2 for BELIEF in Dependence: Leveraging Atomic Linearity in Data Bits for Rethinking Generalized Linear Models

Figure 3 for BELIEF in Dependence: Leveraging Atomic Linearity in Data Bits for Rethinking Generalized Linear Models

Figure 4 for BELIEF in Dependence: Leveraging Atomic Linearity in Data Bits for Rethinking Generalized Linear Models

Two linearly uncorrelated binary variables must be also independent because non-linear dependence cannot manifest with only two possible states. This inherent linearity is the atom of dependency constituting any complex form of relationship. Inspired by this observation, we develop a framework called binary expansion linear effect (BELIEF) for assessing and understanding arbitrary relationships with a binary outcome. Models from the BELIEF framework are easily interpretable because they describe the association of binary variables in the language of linear models, yielding convenient theoretical insight and striking parallels with the Gaussian world. In particular, an algebraic structure on the predictors with nonzero slopes governs conditional independence properties. With BELIEF, one may study generalized linear models (GLM) through transparent linear models, providing insight into how modeling is affected by the choice of link. For example, setting a GLM interaction coefficient to zero does not necessarily lead to the kind of no-interaction model assumption as understood under their linear model counterparts. Furthermore, for a binary response, maximum likelihood estimation for GLMs paradoxically fails under complete separation, when the data are most discriminative, whereas BELIEF estimation automatically reveals the perfect predictor in the data that is responsible for complete separation. We explore these phenomena and provide a host of related theoretical results. We also provide preliminary empirical demonstration and verification of some theoretical results.

Via

Access Paper or Ask Questions

DBT-DMAE: An Effective Multivariate Time Series Pre-Train Model under Missing Data

Sep 16, 2022
Kai Zhang, Qinmin Yang, Chao Li

Figure 1 for DBT-DMAE: An Effective Multivariate Time Series Pre-Train Model under Missing Data

Figure 2 for DBT-DMAE: An Effective Multivariate Time Series Pre-Train Model under Missing Data

Figure 3 for DBT-DMAE: An Effective Multivariate Time Series Pre-Train Model under Missing Data

Figure 4 for DBT-DMAE: An Effective Multivariate Time Series Pre-Train Model under Missing Data

Multivariate time series(MTS) is a universal data type related to many practical applications. However, MTS suffers from missing data problems, which leads to degradation or even collapse of the downstream tasks, such as prediction and classification. The concurrent missing data handling procedures could inevitably arouse the biased estimation and redundancy-training problem when encountering multiple downstream tasks. This paper presents a universally applicable MTS pre-train model, DBT-DMAE, to conquer the abovementioned obstacle. First, a missing representation module is designed by introducing dynamic positional embedding and random masking processing to characterize the missing symptom. Second, we proposed an auto-encoder structure to obtain the generalized MTS encoded representation utilizing an ameliorated TCN structure called dynamic-bidirectional-TCN as the basic unit, which integrates the dynamic kernel and time-fliping trick to draw temporal features effectively. Finally, the overall feed-in and loss strategy is established to ensure the adequate training of the whole model. Comparative experiment results manifest that the DBT-DMAE outperforms the other state-of-the-art methods in six real-world datasets and two different downstream tasks. Moreover, ablation and interpretability experiments are delivered to verify the validity of DBT-DMAE's substructures.

Via

Access Paper or Ask Questions

LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval

Aug 29, 2022
Kai Zhang, Chongyang Tao, Tao Shen, Can Xu, Xiubo Geng, Binxing Jiao, Daxin Jiang

Figure 1 for LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval

Figure 2 for LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval

Figure 3 for LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval

Figure 4 for LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval

Retrieval models based on dense representations in semantic space have become an indispensable branch for first-stage retrieval. These retrievers benefit from surging advances in representation learning towards compressive global sequence-level embeddings. However, they are prone to overlook local salient phrases and entity mentions in texts, which usually play pivot roles in first-stage retrieval. To mitigate this weakness, we propose to make a dense retriever align a well-performing lexicon-aware representation model. The alignment is achieved by weakened knowledge distillations to enlighten the retriever via two aspects -- 1) a lexicon-augmented contrastive objective to challenge the dense encoder and 2) a pair-wise rank-consistent regularization to make dense model's behavior incline to the other. We evaluate our model on three public benchmarks, which shows that with a comparable lexicon-aware retriever as the teacher, our proposed dense one can bring consistent and significant improvements, and even outdo its teacher. In addition, we found our improvement on the dense retriever is complementary to the standard ranker distillation, which can further lift state-of-the-art performance.

* 14 pages, 6 tables, 4 figures

Via

Access Paper or Ask Questions

Practical Real Video Denoising with Realistic Degradation Model

Aug 25, 2022
Jiezhang Cao, Qin Wang, Jingyun Liang, Yulun Zhang, Kai Zhang, Luc Van Gool

Figure 1 for Practical Real Video Denoising with Realistic Degradation Model

Figure 2 for Practical Real Video Denoising with Realistic Degradation Model

Figure 3 for Practical Real Video Denoising with Realistic Degradation Model

Figure 4 for Practical Real Video Denoising with Realistic Degradation Model

Existing video denoising methods typically assume noisy videos are degraded from clean videos by adding Gaussian noise. However, deep models trained on such a degradation assumption will inevitably give rise to poor performance for real videos due to degradation mismatch. Although some studies attempt to train deep models on noisy and noise-free video pairs captured by cameras, such models can only work well for specific cameras and do not generalize well for other videos. In this paper, we propose to lift this limitation and focus on the problem of general real video denoising with the aim to generalize well on unseen real-world videos. We tackle this problem by firstly investigating the common behaviors of video noises and observing two important characteristics: 1) downscaling helps to reduce the noise level in spatial space and 2) the information from the adjacent frames help to remove the noise of current frame in temporal space. Motivated by these two observations, we propose a multi-scale recurrent architecture by making full use of the above two characteristics. Secondly, we propose a synthetic real noise degradation model by randomly shuffling different noise types to train the denoising model. With a synthesized and enriched degradation space, our degradation model can help to bridge the distribution gap between training data and real-world data. Extensive experiments demonstrate that our proposed method achieves the state-of-the-art performance and better generalization ability than existing methods on both synthetic Gaussian denoising and practical real video denoising.

Via

Access Paper or Ask Questions

Hierarchical Reinforcement Learning Based Video Semantic Coding for Segmentation

Aug 24, 2022
Guangqi Xie, Xin Li, Shiqi Lin, Li Zhang, Kai Zhang, Yue Li, Zhibo Chen

Figure 1 for Hierarchical Reinforcement Learning Based Video Semantic Coding for Segmentation

Figure 2 for Hierarchical Reinforcement Learning Based Video Semantic Coding for Segmentation

Figure 3 for Hierarchical Reinforcement Learning Based Video Semantic Coding for Segmentation

The rapid development of intelligent tasks, e.g., segmentation, detection, classification, etc, has brought an urgent need for semantic compression, which aims to reduce the compression cost while maintaining the original semantic information. However, it is impractical to directly integrate the semantic metric into the traditional codecs since they cannot be optimized in an end-to-end manner. To solve this problem, some pioneering works have applied reinforcement learning to implement image-wise semantic compression. Nevertheless, video semantic compression has not been explored since its complex reference architectures and compression modes. In this paper, we take a step forward to video semantic compression and propose the Hierarchical Reinforcement Learning based task-driven Video Semantic Coding, named as HRLVSC. Specifically, to simplify the complex mode decision of video semantic coding, we divided the action space into frame-level and CTU-level spaces in a hierarchical manner, and then explore the best mode selection for them progressively with the cooperation of frame-level and CTU-level agents. Moreover, since the modes of video semantic coding will exponentially increase with the number of frames in a Group of Pictures (GOP), we carefully investigate the effects of different mode selections for video semantic coding and design a simple but effective mode simplification strategy for it. We have validated our HRLVSC on the video segmentation task with HEVC reference software HM16.19. Extensive experimental results demonstrated that our HRLVSC can achieve over 39% BD-rate saving for video semantic coding under the Low Delay P configuration.

* Accepted by VCIP2022

Via

Access Paper or Ask Questions