Most exploration research on reinforcement learning (RL) has paid attention to `the way of exploration', which is `how to explore'. The other exploration research, `when to explore', has not been the main focus of RL exploration research. The issue of `when' of a monolithic exploration in the usual RL exploration behaviour binds an exploratory action to an exploitational action of an agent. Recently, a non-monolithic exploration research has emerged to examine the mode-switching exploration behaviour of humans and animals. The ultimate purpose of our research is to enable an agent to decide when to explore or exploit autonomously. We describe the initial research of an autonomous multi-mode exploration of non-monolithic behaviour in an options framework. The higher performance of our method is shown against the existing non-monolithic exploration method through comparative experimental results.
Graph convolutional networks (GCNs) have achieved great success in graph representation learning by extracting high-level features from nodes and their topology. Since GCNs generally follow a message-passing mechanism, each node aggregates information from its first-order neighbour to update its representation. As a result, the representations of nodes with edges between them should be positively correlated and thus can be considered positive samples. However, there are more non-neighbour nodes in the whole graph, which provide diverse and useful information for the representation update. Two non-adjacent nodes usually have different representations, which can be seen as negative samples. Besides the node representations, the structural information of the graph is also crucial for learning. In this paper, we used quality-diversity decomposition in determinant point processes (DPP) to obtain diverse negative samples. When defining a distribution on diverse subsets of all non-neighbouring nodes, we incorporate both graph structure information and node representations. Since the DPP sampling process requires matrix eigenvalue decomposition, we propose a new shortest-path-base method to improve computational efficiency. Finally, we incorporate the obtained negative samples into the graph convolution operation. The ideas are evaluated empirically in experiments on node classification tasks. These experiments show that the newly proposed methods not only improve the overall performance of standard representation learning but also significantly alleviate over-smoothing problems.
Graph Convolutional Neural Networks (GCNs) has been generally accepted to be an effective tool for node representations learning. An interesting way to understand GCNs is to think of them as a message passing mechanism where each node updates its representation by accepting information from its neighbours (also known as positive samples). However, beyond these neighbouring nodes, graphs have a large, dark, all-but forgotten world in which we find the non-neighbouring nodes (negative samples). In this paper, we show that this great dark world holds a substantial amount of information that might be useful for representation learning. Most specifically, it can provide negative information about the node representations. Our overall idea is to select appropriate negative samples for each node and incorporate the negative information contained in these samples into the representation updates. Moreover, we show that the process of selecting the negative samples is not trivial. Our theme therefore begins by describing the criteria for a good negative sample, followed by a determinantal point process algorithm for efficiently obtaining such samples. A GCN, boosted by diverse negative samples, then jointly considers the positive and negative information when passing messages. Experimental evaluations show that this idea not only improves the overall performance of standard representation learning but also significantly alleviates over-smoothing problems.
Transfer learning where the behavior of extracting transferable knowledge from the source domain(s) and reusing this knowledge to target domain has become a research area of great interest in the field of artificial intelligence. Probabilistic graphical models (PGMs) have been recognized as a powerful tool for modeling complex systems with many advantages, e.g., the ability to handle uncertainty and possessing good interpretability. Considering the success of these two aforementioned research areas, it seems natural to apply PGMs to transfer learning. However, although there are already some excellent PGMs specific to transfer learning in the literature, the potential of PGMs for this problem is still grossly underestimated. This paper aims to boost the development of PGMs for transfer learning by 1) examining the pilot studies on PGMs specific to transfer learning, i.e., analyzing and summarizing the existing mechanisms particularly designed for knowledge transfer; 2) discussing examples of real-world transfer problems where existing PGMs have been successfully applied; and 3) exploring several potential research directions on transfer learning using PGM.
Causal effect estimation for dynamic treatment regimes (DTRs) contributes to sequential decision making. However, censoring and time-dependent confounding under DTRs are challenging as the amount of observational data declines over time due to a reducing sample size but the feature dimension increases over time. Long-term follow-up compounds these challenges. Another challenge is the highly complex relationships between confounders, treatments, and outcomes, which causes the traditional and commonly used linear methods to fail. We combine outcome regression models with treatment models for high dimensional features using uncensored subjects that are small in sample size and we fit deep Bayesian models for outcome regression models to reveal the complex relationships between confounders, treatments, and outcomes. Also, the developed deep Bayesian models can model uncertainty and output the prediction variance which is essential for the safety-aware applications, such as self-driving cars and medical treatment design. The experimental results on medical simulations of HIV treatment show the ability of the proposed method to obtain stable and accurate dynamic causal effect estimation from observational data, especially with long-term follow-up. Our technique provides practical guidance for sequential decision making, and policy-making.
The high-dimensional or sparse reward task of a reinforcement learning (RL) environment requires a superior potential controller such as hierarchical reinforcement learning (HRL) rather than an atomic RL because it absorbs the complexity of commands to achieve the purpose of the task in its hierarchical structure. One of the HRL issues is how to train each level policy with the optimal data collection from its experience. That is to say, how to synchronize adjacent level policies optimally. Our research finds that a HRL model through the off-policy correction technique of HRL, which trains a higher-level policy with the goal of reflecting a lower-level policy which is newly trained using the off-policy method, takes the critical role of synchronizing both level policies at all times while they are being trained. We propose a novel HRL model supporting the optimal level synchronization using the off-policy correction technique with a deep generative model. This uses the advantage of the inverse operation of a flow-based deep generative model (FDGM) to achieve the goal corresponding to the current state of the lower-level policy. The proposed model also considers the freedom of the goal dimension between HRL policies which makes it the generalized inverse model of the model-free RL in HRL with the optimal synchronization method. The comparative experiment results show the performance of our proposed model.
Graph neural networks (GNNs) extends the functionality of traditional neural networks to graph-structured data. Similar to CNNs, an optimized design of graph convolution and pooling is key to success. Borrowing ideas from physics, we propose a path integral based graph neural networks (PAN) for classification and regression tasks on graphs. Specifically, we consider a convolution operation that involves every path linking the message sender and receiver with learnable weights depending on the path length, which corresponds to the maximal entropy random walk. It generalizes the graph Laplacian to a new transition matrix we call maximal entropy transition (MET) matrix derived from a path integral formalism. Importantly, the diagonal entries of the MET matrix are directly related to the subgraph centrality, thus providing a natural and adaptive pooling mechanism. PAN provides a versatile framework that can be tailored for different graph data with varying sizes and structures. We can view most existing GNN architectures as special cases of PAN. Experimental results show that PAN achieves state-of-the-art performance on various graph classification/regression tasks, including a new benchmark dataset from statistical mechanics we propose to boost applications of GNN in physical sciences.
Unsupervised domain adaptation for classification tasks has achieved great progress in leveraging the knowledge in a labeled (source) domain to improve the task performance in an unlabeled (target) domain by mitigating the effect of distribution discrepancy. However, most existing methods can only handle unsupervised closed set domain adaptation (UCSDA), where the source and target domains share the same label set. In this paper, we target a more challenging but realistic setting: unsupervised open set domain adaptation (UOSDA), where the target domain has unknown classes that the source domain does not have. This study is the first to give the generalization bound of open set domain adaptation through theoretically investigating the risk of the target classifier on the unknown classes. The proposed generalization bound for open set domain adaptation has a special term, namely open set difference, which reflects the risk of the target classifier on unknown classes. According to this generalization bound, we propose a novel and theoretically guided unsupervised open set domain adaptation method: Distribution Alignment with Open Difference (DAOD), which is based on the structural risk minimization principle and open set difference regularization. The experiments on several benchmark datasets show the superior performance of the proposed UOSDA method compared with the state-of-the-art methods in the literature.
The cooperative hierarchical structure is a common and significant data structure observed in, or adopted by, many research areas, such as: text mining (author-paper-word) and multi-label classification (label-instance-feature). Renowned Bayesian approaches for cooperative hierarchical structure modeling are mostly based on topic models. However, these approaches suffer from a serious issue in that the number of hidden topics/factors needs to be fixed in advance and an inappropriate number may lead to overfitting or underfitting. One elegant way to resolve this issue is Bayesian nonparametric learning, but existing work in this area still cannot be applied to cooperative hierarchical structure modeling. In this paper, we propose a cooperative hierarchical Dirichlet process (CHDP) to fill this gap. Each node in a cooperative hierarchical structure is assigned a Dirichlet process to model its weights on the infinite hidden factors/topics. Together with measure inheritance from hierarchical Dirichlet process, two kinds of measure cooperation, i.e., superposition and maximization, are defined to capture the many-to-many relationships in the cooperative hierarchical structure. Furthermore, two constructive representations for CHDP, i.e., stick-breaking and international restaurant process, are designed to facilitate the model inference. Experiments on synthetic and real-world data with cooperative hierarchical structures demonstrate the properties and the ability of CHDP for cooperative hierarchical structure modeling and its potential for practical application scenarios.