Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jie Chen

Integration of Physics-Based and Data-Driven Models for Hyperspectral Image Unmixing

Jun 11, 2022
Jie Chen, Min Zhao, Xiuheng Wang, Cédric Richard, Susanto Rahardja

Figure 1 for Integration of Physics-Based and Data-Driven Models for Hyperspectral Image Unmixing

Figure 2 for Integration of Physics-Based and Data-Driven Models for Hyperspectral Image Unmixing

Figure 3 for Integration of Physics-Based and Data-Driven Models for Hyperspectral Image Unmixing

Figure 4 for Integration of Physics-Based and Data-Driven Models for Hyperspectral Image Unmixing

Spectral unmixing is one of the most important quantitative analysis tasks in hyperspectral data processing. Conventional physics-based models are characterized by clear interpretation. However, due to the complex mixture mechanism and limited nonlinearity modeling capacity, these models may not be accurate, especially, in analyzing scenes with unknown physical characteristics. Data-driven methods have developed rapidly in recent years, in particular deep learning methods as they possess superior capability in modeling complex and nonlinear systems. Simply transferring these methods as black-boxes to conduct unmixing may lead to low physical interpretability and generalization ability. Consequently, several contributions have been dedicated to integrating advantages of both physics-based models and data-driven methods. In this article, we present an overview of recent advances on this topic from several aspects, including deep neural network (DNN) structures design, prior capturing and loss design, and summarise these methods in a common mathematical optimization framework. In addition, relevant remarks and discussions are conducted made for providing further understanding and prospective improvement of the methods. The related source codes and data are collected and made available at http://github.com/xiuheng-wang/awesome-hyperspectral-image-unmixing.

* IEEE Signal Process. Mag. Manuscript submitted March 14, 2022

Via

Access Paper or Ask Questions

Universal Deep GNNs: Rethinking Residual Connection in GNNs from a Path Decomposition Perspective for Preventing the Over-smoothing

May 30, 2022
Jie Chen, Weiqi Liu, Zhizhong Huang, Junbin Gao, Junping Zhang, Jian Pu

Figure 1 for Universal Deep GNNs: Rethinking Residual Connection in GNNs from a Path Decomposition Perspective for Preventing the Over-smoothing

Figure 2 for Universal Deep GNNs: Rethinking Residual Connection in GNNs from a Path Decomposition Perspective for Preventing the Over-smoothing

Figure 3 for Universal Deep GNNs: Rethinking Residual Connection in GNNs from a Path Decomposition Perspective for Preventing the Over-smoothing

Figure 4 for Universal Deep GNNs: Rethinking Residual Connection in GNNs from a Path Decomposition Perspective for Preventing the Over-smoothing

The performance of GNNs degrades as they become deeper due to the over-smoothing. Among all the attempts to prevent over-smoothing, residual connection is one of the promising methods due to its simplicity. However, recent studies have shown that GNNs with residual connections only slightly slow down the degeneration. The reason why residual connections fail in GNNs is still unknown. In this paper, we investigate the forward and backward behavior of GNNs with residual connections from a novel path decomposition perspective. We find that the recursive aggregation of the median length paths from the binomial distribution of residual connection paths dominates output representation, resulting in over-smoothing as GNNs go deeper. Entangled propagation and weight matrices cause gradient smoothing and prevent GNNs with residual connections from optimizing to the identity mapping. Based on these findings, we present a Universal Deep GNNs (UDGNN) framework with cold-start adaptive residual connections (DRIVE) and feedforward modules. Extensive experiments demonstrate the effectiveness of our method, which achieves state-of-the-art results over non-smooth heterophily datasets by simply stacking standard GNNs.

Via

Access Paper or Ask Questions

Interpretable Feature Engineering for Time Series Predictors using Attention Networks

May 23, 2022
Tianjie Wang, Jie Chen, Joel Vaughan, Vijayan N. Nair

Figure 1 for Interpretable Feature Engineering for Time Series Predictors using Attention Networks

Figure 2 for Interpretable Feature Engineering for Time Series Predictors using Attention Networks

Figure 3 for Interpretable Feature Engineering for Time Series Predictors using Attention Networks

Figure 4 for Interpretable Feature Engineering for Time Series Predictors using Attention Networks

Regression problems with time-series predictors are common in banking and many other areas of application. In this paper, we use multi-head attention networks to develop interpretable features and use them to achieve good predictive performance. The customized attention layer explicitly uses multiplicative interactions and builds feature-engineering heads that capture temporal dynamics in a parsimonious manner. Convolutional layers are used to combine multivariate time series. We also discuss methods for handling static covariates in the modeling process. Visualization and explanation tools are used to interpret the results and explain the relationship between the inputs and the extracted features. Both simulation and real dataset are used to illustrate the usefulness of the methodology. Keyword: Attention heads, Deep neural networks, Interpretable feature engineering

Via

Access Paper or Ask Questions

Joint learning of object graph and relation graph for visual question answering

May 09, 2022
Hao Li, Xu Li, Belhal Karimi, Jie Chen, Mingming Sun

Figure 1 for Joint learning of object graph and relation graph for visual question answering

Figure 2 for Joint learning of object graph and relation graph for visual question answering

Figure 3 for Joint learning of object graph and relation graph for visual question answering

Figure 4 for Joint learning of object graph and relation graph for visual question answering

Modeling visual question answering(VQA) through scene graphs can significantly improve the reasoning accuracy and interpretability. However, existing models answer poorly for complex reasoning questions with attributes or relations, which causes false attribute selection or missing relation in Figure 1(a). It is because these models cannot balance all kinds of information in scene graphs, neglecting relation and attribute information. In this paper, we introduce a novel Dual Message-passing enhanced Graph Neural Network (DM-GNN), which can obtain a balanced representation by properly encoding multi-scale scene graph information. Specifically, we (i)transform the scene graph into two graphs with diversified focuses on objects and relations; Then we design a dual structure to encode them, which increases the weights from relations (ii)fuse the encoder output with attribute features, which increases the weights from attributes; (iii)propose a message-passing mechanism to enhance the information transfer between objects, relations and attributes. We conduct extensive experiments on datasets including GQA, VG, motif-VG and achieve new state of the art.

* 6 pages, 4 figures, Accepted by ICME 2022

Via

Access Paper or Ask Questions

Performance and Interpretability Comparisons of Supervised Machine Learning Algorithms: An Empirical Study

May 05, 2022
Alice J. Liu, Arpita Mukherjee, Linwei Hu, Jie Chen, Vijayan N. Nair

Figure 1 for Performance and Interpretability Comparisons of Supervised Machine Learning Algorithms: An Empirical Study

Figure 2 for Performance and Interpretability Comparisons of Supervised Machine Learning Algorithms: An Empirical Study

Figure 3 for Performance and Interpretability Comparisons of Supervised Machine Learning Algorithms: An Empirical Study

Figure 4 for Performance and Interpretability Comparisons of Supervised Machine Learning Algorithms: An Empirical Study

This paper compares the performances of three supervised machine learning algorithms in terms of predictive ability and model interpretation on structured or tabular data. The algorithms considered were scikit-learn implementations of extreme gradient boosting machines (XGB) and random forests (RFs), and feedforward neural networks (FFNNs) from TensorFlow. The paper is organized in a findings-based manner, with each section providing general conclusions supported by empirical results from simulation studies that cover a wide range of model complexity and correlation structures among predictors. We considered both continuous and binary responses of different sample sizes. Overall, XGB and FFNNs were competitive, with FFNNs showing better performance in smooth models and tree-based boosting algorithms performing better in non-smooth models. This conclusion held generally for predictive performance, identification of important variables, and determining correct input-output relationships as measured by partial dependence plots (PDPs). FFNNs generally had less over-fitting, as measured by the difference in performance between training and testing datasets. However, the difference with XGB was often small. RFs did not perform well in general, confirming the findings in the literature. All models exhibited different degrees of bias seen in PDPs, but the bias was especially problematic for RFs. The extent of the biases varied with correlation among predictors, response type, and data set sample size. In general, tree-based models tended to over-regularize the fitted model in the tails of predictor distributions. Finally, as to be expected, performances were better for continuous responses compared to binary data and with larger samples.

Via

Access Paper or Ask Questions

Explaining Adverse Actions in Credit Decisions Using Shapley Decomposition

Apr 26, 2022
Vijayan N. Nair, Tianshu Feng, Linwei Hu, Zach Zhang, Jie Chen, Agus Sudjianto

Figure 1 for Explaining Adverse Actions in Credit Decisions Using Shapley Decomposition

Figure 2 for Explaining Adverse Actions in Credit Decisions Using Shapley Decomposition

Figure 3 for Explaining Adverse Actions in Credit Decisions Using Shapley Decomposition

Figure 4 for Explaining Adverse Actions in Credit Decisions Using Shapley Decomposition

When a financial institution declines an application for credit, an adverse action (AA) is said to occur. The applicant is then entitled to an explanation for the negative decision. This paper focuses on credit decisions based on a predictive model for probability of default and proposes a methodology for AA explanation. The problem involves identifying the important predictors responsible for the negative decision and is straightforward when the underlying model is additive. However, it becomes non-trivial even for linear models with interactions. We consider models with low-order interactions and develop a simple and intuitive approach based on first principles. We then show how the methodology generalizes to the well-known Shapely decomposition and the recently proposed concept of Baseline Shapley (B-Shap). Unlike other Shapley techniques in the literature for local interpretability of machine learning results, B-Shap is computationally tractable since it involves just function evaluations. An illustrative case study is used to demonstrate the usefulness of the method. The paper also discusses situations with highly correlated predictors and desirable properties of fitted models in the credit-lending context, such as monotonicity and continuity.

* 20 pages, 8 figures

Via

Access Paper or Ask Questions

Exploiting Neighbor Effect: Conv-Agnostic GNNs Framework for Graphs with Heterophily

Apr 09, 2022
Jie Chen, Shouzhen Chen, Junbin Gao, Zengfeng Huang, Junping Zhang, Jian Pu

Figure 1 for Exploiting Neighbor Effect: Conv-Agnostic GNNs Framework for Graphs with Heterophily

Figure 2 for Exploiting Neighbor Effect: Conv-Agnostic GNNs Framework for Graphs with Heterophily

Figure 3 for Exploiting Neighbor Effect: Conv-Agnostic GNNs Framework for Graphs with Heterophily

Figure 4 for Exploiting Neighbor Effect: Conv-Agnostic GNNs Framework for Graphs with Heterophily

Due to the homophily assumption in graph convolution networks, a common consensus is that graph neural networks (GNNs) perform well on homophilic graphs but may fail on heterophilic graphs with many inter-class edges. In this work, we re-examine the heterophily problem of GNNs and investigate the feature aggregation of inter-class neighbors. Instead of treating the inter-class edges as harmful, under some conditions, we argue that they may provide helpful information for node classification from an entire neighbor's perspective. To better evaluate whether the neighbor is helpful for the downstream tasks, we present the concept of the neighbor effect of each node and use the von Neumann entropy to measure the randomness/identifiability of the neighbor distribution for each class. Moreover, we propose a Conv-Agnostic GNNs framework (CAGNNs) to enhance the performance of GNNs on heterophily datasets by learning the neighbor effect for each node. Specifically, we first decouple the feature of each node into the discriminative feature for downstream tasks and the aggregation feature for graph convolution. Then, we propose a shared mixer module for all layers to adaptively evaluate the neighbor effect of each node to incorporate the neighbor information. Experiments are performed on nine well-known benchmark datasets for the node classification task. The results indicate that our framework is able to improve the average prediction performance by 9.81%, 25.81%, and 20.61% for GIN, GAT, and GCN, respectively. Extensive ablation studies and robustness analysis further verify the effectiveness, robustness, and interpretability of our framework.

Via

Access Paper or Ask Questions

ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval

Mar 31, 2022
Mengjun Cheng, Yipeng Sun, Longchao Wang, Xiongwei Zhu, Kun Yao, Jie Chen, Guoli Song, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Figure 1 for ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval

Figure 2 for ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval

Figure 3 for ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval

Figure 4 for ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval

Visual appearance is considered to be the most important cue to understand images for cross-modal retrieval, while sometimes the scene text appearing in images can provide valuable information to understand the visual semantics. Most of existing cross-modal retrieval approaches ignore the usage of scene text information and directly adding this information may lead to performance degradation in scene text free scenarios. To address this issue, we propose a full transformer architecture to unify these cross-modal retrieval scenarios in a single $\textbf{Vi}$sion and $\textbf{S}$cene $\textbf{T}$ext $\textbf{A}$ggregation framework (ViSTA). Specifically, ViSTA utilizes transformer blocks to directly encode image patches and fuse scene text embedding to learn an aggregated visual representation for cross-modal retrieval. To tackle the modality missing problem of scene text, we propose a novel fusion token based transformer aggregation approach to exchange the necessary scene text information only through the fusion token and concentrate on the most important features in each modality. To further strengthen the visual modality, we develop dual contrastive learning losses to embed both image-text pairs and fusion-text pairs into a common cross-modal space. Compared to existing methods, ViSTA enables to aggregate relevant scene text semantics with visual appearance, and hence improve results under both scene text free and scene text aware scenarios. Experimental results show that ViSTA outperforms other methods by at least $\bf{8.4}\%$ at Recall@1 for scene text aware retrieval task. Compared with state-of-the-art scene text free retrieval methods, ViSTA can achieve better accuracy on Flicker30K and MSCOCO while running at least three times faster during the inference stage, which validates the effectiveness of the proposed framework.

* Accepted by CVPR 2022

Via

Access Paper or Ask Questions

Training-free Transformer Architecture Search

Mar 23, 2022
Qinqin Zhou, Kekai Sheng, Xiawu Zheng, Ke Li, Xing Sun, Yonghong Tian, Jie Chen, Rongrong Ji

Figure 1 for Training-free Transformer Architecture Search

Figure 2 for Training-free Transformer Architecture Search

Figure 3 for Training-free Transformer Architecture Search

Figure 4 for Training-free Transformer Architecture Search

Recently, Vision Transformer (ViT) has achieved remarkable success in several computer vision tasks. The progresses are highly relevant to the architecture design, then it is worthwhile to propose Transformer Architecture Search (TAS) to search for better ViTs automatically. However, current TAS methods are time-consuming and existing zero-cost proxies in CNN do not generalize well to the ViT search space according to our experimental observations. In this paper, for the first time, we investigate how to conduct TAS in a training-free manner and devise an effective training-free TAS (TF-TAS) scheme. Firstly, we observe that the properties of multi-head self-attention (MSA) and multi-layer perceptron (MLP) in ViTs are quite different and that the synaptic diversity of MSA affects the performance notably. Secondly, based on the observation, we devise a modular strategy in TF-TAS that evaluates and ranks ViT architectures from two theoretical perspectives: synaptic diversity and synaptic saliency, termed as DSS-indicator. With DSS-indicator, evaluation results are strongly correlated with the test accuracies of ViT models. Experimental results demonstrate that our TF-TAS achieves a competitive performance against the state-of-the-art manually or automatically design ViT architectures, and it promotes the searching efficiency in ViT search space greatly: from about $24$ GPU days to less than $0.5$ GPU days. Moreover, the proposed DSS-indicator outperforms the existing cutting-edge zero-cost approaches (e.g., TE-score and NASWOT).

* Accepted to CVPR 2022

Via

Access Paper or Ask Questions