Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jose C. Principe

Understanding Convolutional Neural Network Training with Information Theory

Oct 12, 2018

Shujian Yu, Kristoffer Wickstrøm, Robert Jenssen, Jose C. Principe

Figure 1 for Understanding Convolutional Neural Network Training with Information Theory

Figure 2 for Understanding Convolutional Neural Network Training with Information Theory

Figure 3 for Understanding Convolutional Neural Network Training with Information Theory

Figure 4 for Understanding Convolutional Neural Network Training with Information Theory

Abstract:Using information theoretic concepts to understand and explore the inner organization of deep neural networks (DNNs) remains a big challenge. Recently, the concept of an information plane (coupled with the famed information bottleneck principle) began to shed light on the analysis of multilayer perceptrons (MLPs). We provided an in-depth insight into stacked autoencoders (SAEs) using a novel matrix-based Renyi's {\alpha}-entropy functional, enabling for the first time the analysis of the dynamics of learning using information flow in the real-world scenario involving complex network architecture and large data. Despite the great potential of these past works, there are several open questions when it comes to applying information theoretic concepts to understand convolutional neural networks (CNNs). These include for instance the accurate estimation of information quantities among multiple variables, and the many different training methodologies. By extending the novel matrix-based Renyi's {\alpha}-entropy functional to a multivariate scenario and introducing the partial information decomposition (PID) framework, this paper presents a systematic method to analyze CNNs training using information theory. Our results validate two fundamental data processing inequalities in CNNs, and also reveals some fundamental issues embedded in the training phase of CNNs.

* substantial improvement over v1

Via

Access Paper or Ask Questions

Understanding Autoencoders with Information Theoretic Concepts

Aug 23, 2018

Shujian Yu, Jose C. Principe

Figure 1 for Understanding Autoencoders with Information Theoretic Concepts

Figure 2 for Understanding Autoencoders with Information Theoretic Concepts

Figure 3 for Understanding Autoencoders with Information Theoretic Concepts

Figure 4 for Understanding Autoencoders with Information Theoretic Concepts

Abstract:Despite their great success in practical applications, there is still a lack of theoretical and systematic methods to analyze deep neural networks. In this paper, we illustrate an advanced information theoretic methodology to understand the dynamics of learning and the design of autoencoders, a special type of deep learning architectures that resembles a communication channel. By generalizing the information plane to any cost function, and inspecting the roles and dynamics of different layers using layer-wise information quantities, we emphasize the role that mutual information plays in quantifying learning from data. We further suggest and also experimentally validate, for mean square error training, three fundamental properties regarding the layer-wise flow of information and intrinsic dimensionality of the bottleneck layer, using respectively the data processing inequality and the identification of a bifurcation point in the information plane that is controlled by the given data. Our observations have direct impact on the optimal design of autoencoders, the design of alternative feedforward training methods, and even in the problem of generalization.

* 64 pages, 16 figures

Via

Access Paper or Ask Questions

Multivariate Extension of Matrix-based Renyi's α-order Entropy Functional

Aug 23, 2018

Shujian Yu, Luis Gonzalo Sanchez Giraldo, Robert Jenssen, Jose C. Principe

Figure 1 for Multivariate Extension of Matrix-based Renyi's α-order Entropy Functional

Figure 2 for Multivariate Extension of Matrix-based Renyi's α-order Entropy Functional

Figure 3 for Multivariate Extension of Matrix-based Renyi's α-order Entropy Functional

Figure 4 for Multivariate Extension of Matrix-based Renyi's α-order Entropy Functional

Abstract:The matrix-based Renyi's {\alpha}-order entropy functional was recently introduced using the normalized eigenspectrum of an Hermitian matrix of the projected data in the reproducing kernel Hilbert space (RKHS). However, the current theory in the matrix-based Renyi's {\alpha}-order entropy functional only defines the entropy of a single variable or mutual information between two random variables. In information theory and machine learning communities, one is also frequently interested in multivariate information quantities, such as the multivariate joint entropy and different interactive quantities among multiple variables. In this paper, we first define the matrix-based Renyi's {\alpha}-order joint entropy among multiple variables. We then show how this definition can ease the estimation of various information quantities that measure the interactions among multiple variables, such as interactive information and total correlation. We finally present an application to feature selection to show how our definition provides a simple yet powerful way to estimate a widely-acknowledged intractable quantity from data. A real example on hyperspectral image (HSI) band selection is also provided.

* 26 pages, 8 figures

Via

Access Paper or Ask Questions

Request-and-Reverify: Hierarchical Hypothesis Testing for Concept Drift Detection with Expensive Labels

Jun 28, 2018

Shujian Yu, Xiaoyang Wang, Jose C. Principe

Figure 1 for Request-and-Reverify: Hierarchical Hypothesis Testing for Concept Drift Detection with Expensive Labels

Figure 2 for Request-and-Reverify: Hierarchical Hypothesis Testing for Concept Drift Detection with Expensive Labels

Figure 3 for Request-and-Reverify: Hierarchical Hypothesis Testing for Concept Drift Detection with Expensive Labels

Figure 4 for Request-and-Reverify: Hierarchical Hypothesis Testing for Concept Drift Detection with Expensive Labels

Abstract:One important assumption underlying common classification models is the stationarity of the data. However, in real-world streaming applications, the data concept indicated by the joint distribution of feature and label is not stationary but drifting over time. Concept drift detection aims to detect such drifts and adapt the model so as to mitigate any deterioration in the model's predictive performance. Unfortunately, most existing concept drift detection methods rely on a strong and over-optimistic condition that the true labels are available immediately for all already classified instances. In this paper, a novel Hierarchical Hypothesis Testing framework with Request-and-Reverify strategy is developed to detect concept drifts by requesting labels only when necessary. Two methods, namely Hierarchical Hypothesis Testing with Classification Uncertainty (HHT-CU) and Hierarchical Hypothesis Testing with Attribute-wise "Goodness-of-fit" (HHT-AG), are proposed respectively under the novel framework. In experiments with benchmark datasets, our methods demonstrate overwhelming advantages over state-of-the-art unsupervised drift detectors. More importantly, our methods even outperform DDM (the widely used supervised drift detector) when we use significantly fewer labels.

* Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (2018) 3033-3039
* Published as a conference paper at IJCAI 2018

Via

Access Paper or Ask Questions

An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits

Mar 03, 2018

Isaac J. Sledge, Jose C. Principe

Figure 1 for An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits

Figure 2 for An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits

Figure 3 for An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits

Figure 4 for An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits

Abstract:In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. High amounts of policy information are associated with exploration-dominant searches of the space and yield high rewards. Low amounts of policy information favor the exploitation of existing knowledge. Information, in this criterion, is quantified by a parameter that can be varied during search. We demonstrate that a simulated-annealing-like update of this parameter, with a sufficiently fast cooling schedule, leads to an optimal regret that is logarithmic with respect to the number of episodes.

* Entropy

Via

Access Paper or Ask Questions

Guided Policy Exploration for Markov Decision Processes using an Uncertainty-Based Value-of-Information Criterion

Feb 05, 2018

Isaac J. Sledge, Matthew S. Emigh, Jose C. Principe

Figure 1 for Guided Policy Exploration for Markov Decision Processes using an Uncertainty-Based Value-of-Information Criterion

Figure 2 for Guided Policy Exploration for Markov Decision Processes using an Uncertainty-Based Value-of-Information Criterion

Abstract:Reinforcement learning in environments with many action-state pairs is challenging. At issue is the number of episodes needed to thoroughly search the policy space. Most conventional heuristics address this search problem in a stochastic manner. This can leave large portions of the policy space unvisited during the early training stages. In this paper, we propose an uncertainty-based, information-theoretic approach for performing guided stochastic searches that more effectively cover the policy space. Our approach is based on the value of information, a criterion that provides the optimal trade-off between expected costs and the granularity of the search process. The value of information yields a stochastic routine for choosing actions during learning that can explore the policy space in a coarse to fine manner. We augment this criterion with a state-transition uncertainty factor, which guides the search process into previously unexplored regions of the policy space.

* IEEE Transactions on Neural Networks and Learning Systems

Via

Access Paper or Ask Questions

Augmented Space Linear Model

Feb 02, 2018

Zhengda Qin, Badong Chen, Nanning Zheng, Jose C. Principe

Figure 1 for Augmented Space Linear Model

Figure 2 for Augmented Space Linear Model

Figure 3 for Augmented Space Linear Model

Figure 4 for Augmented Space Linear Model

Abstract:The linear model uses the space defined by the input to project the target or desired signal and find the optimal set of model parameters. When the problem is nonlinear, the adaption requires nonlinear models for good performance, but it becomes slower and more cumbersome. In this paper, we propose a linear model called Augmented Space Linear Model (ASLM), which uses the full joint space of input and desired signal as the projection space and approaches the performance of nonlinear models. This new algorithm takes advantage of the linear solution, and corrects the estimate for the current testing phase input with the error assigned to the input space neighborhood in the training phase. This algorithm can solve the nonlinear problem with the computational efficiency of linear methods, which can be regarded as a trade off between accuracy and computational complexity. Making full use of the training data, the proposed augmented space model may provide a new way to improve many modeling tasks.

* 5 pages and 1 figures

Via

Access Paper or Ask Questions

Robustness of Maximum Correntropy Estimation Against Large Outliers

Nov 23, 2017

Badong Chen, Lei Xing, Haiquan Zhao, Bin Xu, Jose C. Principe

Figure 1 for Robustness of Maximum Correntropy Estimation Against Large Outliers

Figure 2 for Robustness of Maximum Correntropy Estimation Against Large Outliers

Figure 3 for Robustness of Maximum Correntropy Estimation Against Large Outliers

Figure 4 for Robustness of Maximum Correntropy Estimation Against Large Outliers

Abstract:The maximum correntropy criterion (MCC) has recently been successfully applied in robust regression, classification and adaptive filtering, where the correntropy is maximized instead of minimizing the well-known mean square error (MSE) to improve the robustness with respect to outliers (or impulsive noises). Considerable efforts have been devoted to develop various robust adaptive algorithms under MCC, but so far little insight has been gained as to how the optimal solution will be affected by outliers. In this work, we study this problem in the context of parameter estimation for a simple linear errors-in-variables (EIV) model where all variables are scalar. Under certain conditions, we derive an upper bound on the absolute value of the estimation error and show that the optimal solution under MCC can be very close to the true value of the unknown parameter even with outliers (whose values can be arbitrarily large) in both input and output variables. Illustrative examples are presented to verify and clarify the theory.

* 8 pages, 7 figures

Via

Access Paper or Ask Questions

Analysis of Agent Expertise in Ms. Pac-Man using Value-of-Information-based Policies

Nov 04, 2017

Isaac J. Sledge, Jose C. Principe

Figure 1 for Analysis of Agent Expertise in Ms. Pac-Man using Value-of-Information-based Policies

Figure 2 for Analysis of Agent Expertise in Ms. Pac-Man using Value-of-Information-based Policies

Figure 3 for Analysis of Agent Expertise in Ms. Pac-Man using Value-of-Information-based Policies

Figure 4 for Analysis of Agent Expertise in Ms. Pac-Man using Value-of-Information-based Policies

Abstract:Conventional reinforcement learning methods for Markov decision processes rely on weakly-guided, stochastic searches to drive the learning process. It can therefore be difficult to predict what agent behaviors might emerge. In this paper, we consider an information-theoretic cost function for performing constrained stochastic searches that promote the formation of risk-averse to risk-favoring behaviors. This cost function is the value of information, which provides the optimal trade-off between the expected return of a policy and the policy's complexity; policy complexity is measured by number of bits and controlled by a single hyperparameter on the cost function. As the policy complexity is reduced, the agents will increasingly eschew risky actions. This reduces the potential for high accrued rewards. As the policy complexity increases, the agents will take actions, regardless of the risk, that can raise the long-term rewards. The obtainable reward depends on a single, tunable hyperparameter that regulates the degree of policy complexity. We evaluate the performance of value-of-information-based policies on a stochastic version of Ms. Pac-Man. A major component of this paper is the demonstration that ranges of policy complexity values yield different game-play styles and explaining why this occurs. We also show that our reinforcement-learning search mechanism is more efficient than the others we utilize. This result implies that the value of information theory is appropriate for framing the exploitation-exploration trade-off in reinforcement learning.

* IEEE Transactions on Computational Intelligence and Artificial Intelligence in Games

Via

Access Paper or Ask Questions

Partitioning Relational Matrices of Similarities or Dissimilarities using the Value of Information

Oct 28, 2017

Isaac J. Sledge, Jose C. Principe

Figure 1 for Partitioning Relational Matrices of Similarities or Dissimilarities using the Value of Information

Abstract:In this paper, we provide an approach to clustering relational matrices whose entries correspond to either similarities or dissimilarities between objects. Our approach is based on the value of information, a parameterized, information-theoretic criterion that measures the change in costs associated with changes in information. Optimizing the value of information yields a deterministic annealing style of clustering with many benefits. For instance, investigators avoid needing to a priori specify the number of clusters, as the partitions naturally undergo phase changes, during the annealing process, whereby the number of clusters changes in a data-driven fashion. The global-best partition can also often be identified.

* Submitted to the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Via

Access Paper or Ask Questions