Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shujian Yu

Robust and Fast Measure of Information via Low-rank Representation

Nov 30, 2022

Yuxin Dong, Tieliang Gong, Shujian Yu, Hong Chen, Chen Li

Abstract:The matrix-based R\'enyi's entropy allows us to directly quantify information measures from given data, without explicit estimation of the underlying probability distribution. This intriguing property makes it widely applied in statistical inference and machine learning tasks. However, this information theoretical quantity is not robust against noise in the data, and is computationally prohibitive in large-scale applications. To address these issues, we propose a novel measure of information, termed low-rank matrix-based R\'enyi's entropy, based on low-rank representations of infinitely divisible kernel matrices. The proposed entropy functional inherits the specialty of of the original definition to directly quantify information from data, but enjoys additional advantages including robustness and effective calculation. Specifically, our low-rank variant is more sensitive to informative perturbations induced by changes in underlying distributions, while being insensitive to uninformative ones caused by noises. Moreover, low-rank R\'enyi's entropy can be efficiently approximated by random projection and Lanczos iteration techniques, reducing the overall complexity from $\mathcal{O}(n^3)$ to $\mathcal{O}(n^2 s)$ or even $\mathcal{O}(ns^2)$, where $n$ is the number of data samples and $s \ll n$. We conduct large-scale experiments to evaluate the effectiveness of this new information measure, demonstrating superior results compared to matrix-based R\'enyi's entropy in terms of both performance and computational efficiency.

Via

Access Paper or Ask Questions

Information-Theoretic Hashing for Zero-Shot Cross-Modal Retrieval

Sep 26, 2022

Yufeng Shi, Shujian Yu, Duanquan Xu, Xinge You

Abstract:Zero-shot cross-modal retrieval (ZS-CMR) deals with the retrieval problem among heterogenous data from unseen classes. Typically, to guarantee generalization, the pre-defined class embeddings from natural language processing (NLP) models are used to build a common space. In this paper, instead of using an extra NLP model to define a common space beforehand, we consider a totally different way to construct (or learn) a common hamming space from an information-theoretic perspective. We term our model the Information-Theoretic Hashing (ITH), which is composed of two cascading modules: an Adaptive Information Aggregation (AIA) module; and a Semantic Preserving Encoding (SPE) module. Specifically, our AIA module takes the inspiration from the Principle of Relevant Information (PRI) to construct a common space that adaptively aggregates the intrinsic semantics of different modalities of data and filters out redundant or irrelevant information. On the other hand, our SPE module further generates the hashing codes of different modalities by preserving the similarity of intrinsic semantics with the element-wise Kullback-Leibler (KL) divergence. A total correlation regularization term is also imposed to reduce the redundancy amongst different dimensions of hash codes. Sufficient experiments on three benchmark datasets demonstrate the superiority of the proposed ITH in ZS-CMR. Source code is available in the supplementary material.

Via

Access Paper or Ask Questions

Principle of Relevant Information for Graph Sparsification

May 31, 2022

Shujian Yu, Francesco Alesiani, Wenzhe Yin, Robert Jenssen, Jose C. Principe

Figure 1 for Principle of Relevant Information for Graph Sparsification

Figure 2 for Principle of Relevant Information for Graph Sparsification

Figure 3 for Principle of Relevant Information for Graph Sparsification

Figure 4 for Principle of Relevant Information for Graph Sparsification

Abstract:Graph sparsification aims to reduce the number of edges of a graph while maintaining its structural properties. In this paper, we propose the first general and effective information-theoretic formulation of graph sparsification, by taking inspiration from the Principle of Relevant Information (PRI). To this end, we extend the PRI from a standard scalar random variable setting to structured data (i.e., graphs). Our Graph-PRI objective is achieved by operating on the graph Laplacian, made possible by expressing the graph Laplacian of a subgraph in terms of a sparse edge selection vector $\mathbf{w}$. We provide both theoretical and empirical justifications on the validity of our Graph-PRI approach. We also analyze its analytical solutions in a few special cases. We finally present three representative real-world applications, namely graph sparsification, graph regularized multi-task learning, and medical imaging-derived brain network classification, to demonstrate the effectiveness, the versatility and the enhanced interpretability of our approach over prevalent sparsification techniques. Code of Graph-PRI is available at https://github.com/SJYuCNEL/PRI-Graphs

* accepted by UAI-22

Via

Access Paper or Ask Questions

Optimal Randomized Approximations for Matrix based Renyi's Entropy

May 16, 2022

Yuxin Dong, Tieliang Gong, Shujian Yu, Chen Li

Figure 1 for Optimal Randomized Approximations for Matrix based Renyi's Entropy

Figure 2 for Optimal Randomized Approximations for Matrix based Renyi's Entropy

Figure 3 for Optimal Randomized Approximations for Matrix based Renyi's Entropy

Figure 4 for Optimal Randomized Approximations for Matrix based Renyi's Entropy

Abstract:The Matrix-based Renyi's entropy enables us to directly measure information quantities from given data without the costly probability density estimation of underlying distributions, thus has been widely adopted in numerous statistical learning and inference tasks. However, exactly calculating this new information quantity requires access to the eigenspectrum of a semi-positive definite (SPD) matrix $A$ which grows linearly with the number of samples $n$, resulting in a $O(n^3)$ time complexity that is prohibitive for large-scale applications. To address this issue, this paper takes advantage of stochastic trace approximations for matrix-based Renyi's entropy with arbitrary $\alpha \in R^+$ orders, lowering the complexity by converting the entropy approximation to a matrix-vector multiplication problem. Specifically, we develop random approximations for integer order $\alpha$ cases and polynomial series approximations (Taylor and Chebyshev) for non-integer $\alpha$ cases, leading to a $O(n^2sm)$ overall time complexity, where $s,m \ll n$ denote the number of vector queries and the polynomial order respectively. We theoretically establish statistical guarantees for all approximation algorithms and give explicit order of s and m with respect to the approximation error $\varepsilon$, showing optimal convergence rate for both parameters up to a logarithmic factor. Large-scale simulations and real-world applications validate the effectiveness of the developed approximations, demonstrating remarkable speedup with negligible loss in accuracy.

Via

Access Paper or Ask Questions

BrainIB: Interpretable Brain Network-based Psychiatric Diagnosis with Graph Information Bottleneck

May 07, 2022

Kaizhong Zheng, Shujian Yu, Baojuan Li, Robert Jenssen, Badong Chen

Figure 1 for BrainIB: Interpretable Brain Network-based Psychiatric Diagnosis with Graph Information Bottleneck

Figure 2 for BrainIB: Interpretable Brain Network-based Psychiatric Diagnosis with Graph Information Bottleneck

Figure 3 for BrainIB: Interpretable Brain Network-based Psychiatric Diagnosis with Graph Information Bottleneck

Figure 4 for BrainIB: Interpretable Brain Network-based Psychiatric Diagnosis with Graph Information Bottleneck

Abstract:Developing a new diagnostic models based on the underlying biological mechanisms rather than subjective symptoms for psychiatric disorders is an emerging consensus. Recently, machine learning-based classifiers using functional connectivity (FC) for psychiatric disorders and healthy controls are developed to identify brain markers. However, existing machine learningbased diagnostic models are prone to over-fitting (due to insufficient training samples) and perform poorly in new test environment. Furthermore, it is difficult to obtain explainable and reliable brain biomarkers elucidating the underlying diagnostic decisions. These issues hinder their possible clinical applications. In this work, we propose BrainIB, a new graph neural network (GNN) framework to analyze functional magnetic resonance images (fMRI), by leveraging the famed Information Bottleneck (IB) principle. BrainIB is able to identify the most informative regions in the brain (i.e., subgraph) and generalizes well to unseen data. We evaluate the performance of BrainIB against 6 popular brain network classification methods on two multi-site, largescale datasets and observe that our BrainIB always achieves the highest diagnosis accuracy. It also discovers the subgraph biomarkers which are consistent to clinical and neuroimaging findings.

* 12 pages, 4 figures

Via

Access Paper or Ask Questions

Multi-view Information Bottleneck Without Variational Approximation

Apr 22, 2022

Qi Zhang, Shujian Yu, Jingmin Xin, Badong Chen

Figure 1 for Multi-view Information Bottleneck Without Variational Approximation

Figure 2 for Multi-view Information Bottleneck Without Variational Approximation

Figure 3 for Multi-view Information Bottleneck Without Variational Approximation

Figure 4 for Multi-view Information Bottleneck Without Variational Approximation

Abstract:By "intelligently" fusing the complementary information across different views, multi-view learning is able to improve the performance of classification tasks. In this work, we extend the information bottleneck principle to a supervised multi-view learning scenario and use the recently proposed matrix-based R{\'e}nyi's $\alpha$-order entropy functional to optimize the resulting objective directly, without the necessity of variational approximation or adversarial training. Empirical results in both synthetic and real-world datasets suggest that our method enjoys improved robustness to noise and redundant information in each view, especially given limited training samples. Code is available at~\url{https://github.com/archy666/MEIB}.

* Manuscript is accepted by ICASSP-22

Via

Access Paper or Ask Questions

R2-Trans:Fine-Grained Visual Categorization with Redundancy Reduction

Apr 21, 2022

Yu Wang, Shuo Ye, Shujian Yu, Xinge You

Figure 1 for R2-Trans:Fine-Grained Visual Categorization with Redundancy Reduction

Figure 2 for R2-Trans:Fine-Grained Visual Categorization with Redundancy Reduction

Figure 3 for R2-Trans:Fine-Grained Visual Categorization with Redundancy Reduction

Figure 4 for R2-Trans:Fine-Grained Visual Categorization with Redundancy Reduction

Abstract:Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories, whose main challenge is the large intraclass diversities and subtle inter-class differences. Existing FGVC methods usually select discriminant regions found by a trained model, which is prone to neglect other potential discriminant information. On the other hand, the massive interactions between the sequence of image patches in ViT make the resulting class-token contain lots of redundant information, which may also impacts FGVC performance. In this paper, we present a novel approach for FGVC, which can simultaneously make use of partial yet sufficient discriminative information in environmental cues and also compress the redundant information in class-token with respect to the target. Specifically, our model calculates the ratio of high-weight regions in a batch, adaptively adjusts the masking threshold and achieves moderate extraction of background information in the input space. Moreover, we also use the Information Bottleneck~(IB) approach to guide our network to learn a minimum sufficient representations in the feature space. Experimental results on three widely-used benchmark datasets verify that our approach can achieve outperforming performance than other state-of-the-art approaches and baseline models.

Via

Access Paper or Ask Questions

Deep Deterministic Independent Component Analysis for Hyperspectral Unmixing

Feb 15, 2022

Hongming Li, Shujian Yu, Jose C. Principe

Figure 1 for Deep Deterministic Independent Component Analysis for Hyperspectral Unmixing

Figure 2 for Deep Deterministic Independent Component Analysis for Hyperspectral Unmixing

Figure 3 for Deep Deterministic Independent Component Analysis for Hyperspectral Unmixing

Figure 4 for Deep Deterministic Independent Component Analysis for Hyperspectral Unmixing

Abstract:We develop a new neural network based independent component analysis (ICA) method by directly minimizing the dependence amongst all extracted components. Using the matrix-based R{\'e}nyi's $\alpha$-order entropy functional, our network can be directly optimized by stochastic gradient descent (SGD), without any variational approximation or adversarial training. As a solid application, we evaluate our ICA in the problem of hyperspectral unmixing (HU) and refute a statement that "\emph{ICA does not play a role in unmixing hyperspectral data}", which was initially suggested by \cite{nascimento2005does}. Code and additional remarks of our DDICA is available at https://github.com/hongmingli1995/DDICA.

* Accepted by ICASSP 2022

Via

Access Paper or Ask Questions

Computationally Efficient Approximations for Matrix-based Renyi's Entropy

Dec 27, 2021

Tieliang Gong, Yuxin Dong, Shujian Yu, Hong Chen, Bo Dong, Chen Li, Qinghua Zheng

Figure 1 for Computationally Efficient Approximations for Matrix-based Renyi's Entropy

Figure 2 for Computationally Efficient Approximations for Matrix-based Renyi's Entropy

Figure 3 for Computationally Efficient Approximations for Matrix-based Renyi's Entropy

Figure 4 for Computationally Efficient Approximations for Matrix-based Renyi's Entropy

Abstract:The recently developed matrix based Renyi's entropy enables measurement of information in data simply using the eigenspectrum of symmetric positive semi definite (PSD) matrices in reproducing kernel Hilbert space, without estimation of the underlying data distribution. This intriguing property makes the new information measurement widely adopted in multiple statistical inference and learning tasks. However, the computation of such quantity involves the trace operator on a PSD matrix $G$ to power $\alpha$(i.e., $tr(G^\alpha)$), with a normal complexity of nearly $O(n^3)$, which severely hampers its practical usage when the number of samples (i.e., $n$) is large. In this work, we present computationally efficient approximations to this new entropy functional that can reduce its complexity to even significantly less than $O(n^2)$. To this end, we first develop randomized approximations to $\tr(\G^\alpha)$ that transform the trace estimation into matrix-vector multiplications problem. We extend such strategy for arbitrary values of $\alpha$ (integer or non-integer). We then establish the connection between the matrix-based Renyi's entropy and PSD matrix approximation, which enables us to exploit both clustering and block low-rank structure of $\G$ to further reduce the computational cost. We theoretically provide approximation accuracy guarantees and illustrate the properties of different approximations. Large-scale experimental evaluations on both synthetic and real-world data corroborate our theoretical findings, showing promising speedup with negligible loss in accuracy.

Via

Access Paper or Ask Questions

Gated Information Bottleneck for Generalization in Sequential Environments

Oct 12, 2021

Francesco Alesiani, Shujian Yu, Xi Yu

Figure 1 for Gated Information Bottleneck for Generalization in Sequential Environments

Figure 2 for Gated Information Bottleneck for Generalization in Sequential Environments

Figure 3 for Gated Information Bottleneck for Generalization in Sequential Environments

Figure 4 for Gated Information Bottleneck for Generalization in Sequential Environments

Abstract:Deep neural networks suffer from poor generalization to unseen environments when the underlying data distribution is different from that in the training set. By learning minimum sufficient representations from training data, the information bottleneck (IB) approach has demonstrated its effectiveness to improve generalization in different AI applications. In this work, we propose a new neural network-based IB approach, termed gated information bottleneck (GIB), that dynamically drops spurious correlations and progressively selects the most task-relevant features across different environments by a trainable soft mask (on raw features). GIB enjoys a simple and tractable objective, without any variational approximation or distributional assumption. We empirically demonstrate the superiority of GIB over other popular neural network-based IB approaches in adversarial robustness and out-of-distribution (OOD) detection. Meanwhile, we also establish the connection between IB theory and invariant causal representation learning, and observed that GIB demonstrates appealing performance when different environments arrive sequentially, a more practical scenario where invariant risk minimization (IRM) fails. Code of GIB is available at https://github.com/falesiani/GIB

* manuscript accepted by IEEE ICDM-21 (regular papers), code is available at https://github.com/falesiani/GIB

Via

Access Paper or Ask Questions