Deep reinforcement learning (DRL) faces significant challenges in addressing the hard-exploration problems in tasks with sparse or deceptive rewards and large state spaces. These challenges severely limit the practical application of DRL. Most previous exploration methods relied on complex architectures to estimate state novelty or introduced sensitive hyperparameters, resulting in instability. To mitigate these issues, we propose an efficient adaptive trajectory-constrained exploration strategy for DRL. The proposed method guides the policy of the agent away from suboptimal solutions by leveraging incomplete offline demonstrations as references. This approach gradually expands the exploration scope of the agent and strives for optimality in a constrained optimization manner. Additionally, we introduce a novel policy-gradient-based optimization algorithm that utilizes adaptively clipped trajectory-distance rewards for both single- and multi-agent reinforcement learning. We provide a theoretical analysis of our method, including a deduction of the worst-case approximation error bounds, highlighting the validity of our approach for enhancing exploration. To evaluate the effectiveness of the proposed method, we conducted experiments on two large 2D grid world mazes and several MuJoCo tasks. The extensive experimental results demonstrate the significant advantages of our method in achieving temporally extended exploration and avoiding myopic and suboptimal behaviors in both single- and multi-agent settings. Notably, the specific metrics and quantifiable results further support these findings. The code used in the study is available at \url{https://github.com/buaawgj/TACE}.
Autonomous individuals establish a structural complex system through pairwise connections and interactions. Notably, the evolution reflects the dynamic nature of each complex system since it recodes a series of temporal changes from the past, the present into the future. Different systems follow distinct evolutionary trajectories, which can serve as distinguishing traits for system classification. However, modeling a complex system's evolution is challenging for the graph model because the graph is typically a snapshot of the static status of a system, and thereby hard to manifest the long-term evolutionary traits of a system entirely. To address this challenge, we suggest utilizing a heat-driven method to generate temporal graph augmentation. This approach incorporates the physics-based heat kernel and DropNode technique to transform each static graph into a sequence of temporal ones. This approach effectively describes the evolutional behaviours of the system, including the retention or disappearance of elements at each time point based on the distributed heat on each node. Additionally, we propose a dynamic time-wrapping distance GDTW to quantitatively measure the distance between pairwise evolutionary systems through optimal matching. The resulting approach, called the Evolution Kernel method, has been successfully applied to classification problems in real-world structural graph datasets. The results yield significant improvements in supervised classification accuracy over a series of baseline methods.
Deep learning utilizing deep neural networks (DNNs) has achieved a lot of success recently in many important areas such as computer vision, natural language processing, and recommendation systems. The lack of convexity for DNNs has been seen as a major disadvantage of many optimization methods, such as stochastic gradient descent, which greatly reduces the genelization of neural network applications. We realize that the convexity make sense in the neural network and propose the exponential multilayer neural network (EMLP), a class of parameter convex neural network (PCNN) which is convex with regard to the parameters of the neural network under some conditions that can be realized. Besides, we propose the convexity metric for the two-layer EGCN and test the accuracy when the convexity metric changes. For late experiments, we use the same architecture to make the exponential graph convolutional network (EGCN) and do the experiment on the graph classificaion dataset in which our model EGCN performs better than the graph convolutional network (GCN) and the graph attention network (GAT).
First-order methods like stochastic gradient descent(SGD) are recently the popular optimization method to train deep neural networks (DNNs), but second-order methods are scarcely used because of the overpriced computing cost in getting the high-order information. In this paper, we propose the Damped Newton Stochastic Gradient Descent(DN-SGD) method and Stochastic Gradient Descent Damped Newton(SGD-DN) method to train DNNs for regression problems with Mean Square Error(MSE) and classification problems with Cross-Entropy Loss(CEL), which is inspired by a proved fact that the hessian matrix of last layer of DNNs is always semi-definite. Different from other second-order methods to estimate the hessian matrix of all parameters, our methods just accurately compute a small part of the parameters, which greatly reduces the computational cost and makes convergence of the learning process much faster and more accurate than SGD. Several numerical experiments on real datesets are performed to verify the effectiveness of our methods for regression and classification problems.
Graphs are often used to organize data because of their simple topological structure, and therefore play a key role in machine learning. And it turns out that the low-dimensional embedded representation obtained by graph representation learning are extremely useful in various typical tasks, such as node classification, content recommendation and link prediction. However, the existing methods mostly start from the microstructure (i.e., the edges) in the graph, ignoring the mesoscopic structure (high-order local structure). Here, we propose wGCN -- a novel framework that utilizes random walk to obtain the node-specific mesoscopic structures of the graph, and utilizes these mesoscopic structures to reconstruct the graph And organize the characteristic information of the nodes. Our method can effectively generate node embeddings for previously unseen data, which has been proven in a series of experiments conducted on citation networks and social networks (our method has advantages over baseline methods). We believe that combining high-order local structural information can more efficiently explore the potential of the network, which will greatly improve the learning efficiency of graph neural network and promote the establishment of new learning models.
The graph structure is a commonly used data storage mode, and it turns out that the low-dimensional embedded representation of nodes in the graph is extremely useful in various typical tasks, such as node classification, link prediction , etc. However, most of the existing approaches start from the binary relationship (i.e., edges) in the graph and have not leveraged the higher order local structure (i.e., motifs) of the graph. Here, we propose mGCMN -- a novel framework which utilizes node feature information and the higher order local structure of the graph to effectively generate node embeddings for previously unseen data. Through research we have found that different types of networks have different key motifs. And the advantages of our method over the baseline methods have been demonstrated in a large number of experiments on citation network and social network datasets. At the same time, a positive correlation between increase of the classification accuracy and the clustering coefficient is revealed. It is believed that using high order structural information can truly manifest the potential of the network, which will greatly improve the learning efficiency of the graph neural network and promote a brand-new learning mode establishment.
Zero-shot learning (ZSL) aims to recognize the novel object categories using the semantic representation of categories, and the key idea is to explore the knowledge of how the novel class is semantically related to the familiar classes. Some typical models are to learn the proper embedding between the image feature space and the semantic space, whilst it is important to learn discriminative features and comprise the coarse-to-fine image feature and semantic information. In this paper, we propose a discriminative embedding autoencoder with a regressor feedback model for ZSL. The encoder learns a mapping from the image feature space to the discriminative embedding space, which regulates both inter-class and intra-class distances between the learned features by a margin, making the learned features be discriminative for object recognition. The regressor feedback learns to map the reconstructed samples back to the the discriminative embedding and the semantic embedding, assisting the decoder to improve the quality of the samples and provide a generalization to the unseen classes. The proposed model is validated extensively on four benchmark datasets: SUN, CUB, AWA1, AWA2, the experiment results show that our proposed model outperforms the state-of-the-art models, and especially in the generalized zero-shot learning (GZSL), significant improvements are achieved.
This paper analyzes the scaling window of a random CSP model (i.e. model RB) for which we can identify the threshold points exactly, denoted by $r_{cr}$ or $p_{cr}$. For this model, we establish the scaling window $W(n,\delta)=(r_{-}(n,\delta), r_{+}(n,\delta))$ such that the probability of a random instance being satisfiable is greater than $1-\delta$ for $r<r_{-}(n,\delta)$ and is less than $\delta$ for $r>r_{+}(n,\delta)$. Specifically, we obtain the following result $$W(n,\delta)=(r_{cr}-\Theta(\frac{1}{n^{1-\epsilon}\ln n}), \ r_{cr}+\Theta(\frac{1}{n\ln n})),$$ where $0\leq\epsilon<1$ is a constant. A similar result with respect to the other parameter $p$ is also obtained. Since the instances generated by model RB have been shown to be hard at the threshold, this is the first attempt, as far as we know, to analyze the scaling window of such a model with hard instances.