Evolutionary Reinforcement Learning (ERL) that applying Evolutionary Algorithms (EAs) to optimize the weight parameters of Deep Neural Network (DNN) based policies has been widely regarded as an alternative to traditional reinforcement learning methods. However, the evaluation of the iteratively generated population usually requires a large amount of computational time and can be prohibitively expensive, which may potentially restrict the applicability of ERL. Surrogate is often used to reduce the computational burden of evaluation in EAs. Unfortunately, in ERL, each individual of policy usually represents millions of weights parameters of DNN. This high-dimensional representation of policy has introduced a great challenge to the application of surrogates into ERL to speed up training. This paper proposes a PE-SAERL Framework to at the first time enable surrogate-assisted evolutionary reinforcement learning via policy embedding (PE). Empirical results on 5 Atari games show that the proposed method can perform more efficiently than the four state-of-the-art algorithms. The training process is accelerated up to 7x on tested games, comparing to its counterpart without the surrogate and PE.
Attack Ensemble (AE), which combines multiple attacks together, provides a reliable way to evaluate adversarial robustness. In practice, AEs are often constructed and tuned by human experts, which however tends to be sub-optimal and time-consuming. In this work, we present AutoAE, a conceptually simple approach for automatically constructing AEs. In brief, AutoAE repeatedly adds the attack and its iteration steps to the ensemble that maximizes ensemble improvement per additional iteration consumed. We show theoretically that AutoAE yields AEs provably within a constant factor of the optimal for a given defense. We then use AutoAE to construct two AEs for $l_{\infty}$ and $l_2$ attacks, and apply them without any tuning or adaptation to 45 top adversarial defenses on the RobustBench leaderboard. In all except one cases we achieve equal or better (often the latter) robustness evaluation than existing AEs, and notably, in 29 cases we achieve better robustness evaluation than the best known one. Such performance of AutoAE shows itself as a reliable evaluation protocol for adversarial robustness, which further indicates the huge potential of automatic AE construction. Code is available at \url{https://github.com/LeegerPENG/AutoAE}.
Hierarchical reinforcement learning (HRL) effectively improves agents' exploration efficiency on tasks with sparse reward, with the guide of high-quality hierarchical structures (e.g., subgoals or options). However, how to automatically discover high-quality hierarchical structures is still a great challenge. Previous HRL methods can hardly discover the hierarchical structures in complex environments due to the low exploration efficiency by exploiting the randomness-driven exploration paradigm. To address this issue, we propose CDHRL, a causality-driven hierarchical reinforcement learning framework, leveraging a causality-driven discovery instead of a randomness-driven exploration to effectively build high-quality hierarchical structures in complicated environments. The key insight is that the causalities among environment variables are naturally fit for modeling reachable subgoals and their dependencies and can perfectly guide to build high-quality hierarchical structures. The results in two complex environments, 2D-Minecraft and Eden, show that CDHRL significantly boosts exploration efficiency with the causality-driven paradigm.
Traditional solvers for tackling combinatorial optimization (CO) problems are usually designed by human experts. Recently, there has been a surge of interest in utilizing Deep Learning, especially Deep Reinforcement Learning, to automatically learn effective solvers for CO. The resultant new paradigm is termed Neural Combinatorial Optimization (NCO). However, the advantages and disadvantages of NCO over other approaches have not been well studied empirically or theoretically. In this work, we present a comprehensive comparative study of NCO solvers and alternative solvers. Specifically, taking the Traveling Salesman Problem as the testbed problem, we assess the performance of the solvers in terms of five aspects, i.e., effectiveness, efficiency, stability, scalability and generalization ability. Our results show that in general the solvers learned by NCO approaches still fall short of traditional solvers in nearly all these aspects. A potential benefit of the former would be their superior time and energy efficiency on small-size problem instances when sufficient training instances are available. We hope this work would help better understand the strengths and weakness of NCO, and provide a comprehensive evaluation protocol for further benchmarking NCO approaches against other approaches.
Social recommendations utilize social relations to enhance the representation learning for recommendations. Most social recommendation models unify user representations for the user-item interactions (collaborative domain) and social relations (social domain). However, such an approach may fail to model the users heterogeneous behavior patterns in two domains, impairing the expressiveness of user representations. In this work, to address such limitation, we propose a novel Disentangled contrastive learning framework for social Recommendations DcRec. More specifically, we propose to learn disentangled users representations from the item and social domains. Moreover, disentangled contrastive learning is designed to perform knowledge transfer between disentangled users representations for social recommendations. Comprehensive experiments on various real-world datasets demonstrate the superiority of our proposed model.
With the popularity of deep learning, the hardware implementation platform of deep learning has received increasing interest. Unlike the general purpose devices, e.g., CPU, or GPU, where the deep learning algorithms are executed at the software level, neural network hardware accelerators directly execute the algorithms to achieve higher both energy efficiency and performance improvements. However, as the deep learning algorithms evolve frequently, the engineering effort and cost of designing the hardware accelerators are greatly increased. To improve the design quality while saving the cost, design automation for neural network accelerators was proposed, where design space exploration algorithms are used to automatically search the optimized accelerator design within a design space. Nevertheless, the increasing complexity of the neural network accelerators brings the increasing dimensions to the design space. As a result, the previous design space exploration algorithms are no longer effective enough to find an optimized design. In this work, we propose a neural network accelerator design automation framework named GANDSE, where we rethink the problem of design space exploration, and propose a novel approach based on the generative adversarial network (GAN) to support an optimized exploration for high dimension large design space. The experiments show that GANDSE is able to find the more optimized designs in negligible time compared with approaches including multilayer perceptron and deep reinforcement learning.
The security issues in DNNs, such as adversarial examples, have attracted much attention. Adversarial examples refer to the examples which are capable to induce the DNNs return completely predictions by introducing carefully designed perturbations. Obviously, adversarial examples bring great security risks to the development of deep learning. Recently, Some defense approaches against adversarial examples have been proposed, however, in our opinion, the performance of these approaches are still limited. In this paper, we propose a new ensemble defense approach named the Negative Correlation Ensemble (NCEn), which achieves compelling results by introducing gradient directions and gradient magnitudes of each member in the ensemble negatively correlated and at the same time, reducing the transferability of adversarial examples among them. Extensive experiments have been conducted, and the results demonstrate that NCEn can improve the adversarial robustness of ensembles effectively.
Deep neural networks are vulnerable to adversarial examples, even in the black-box setting where the attacker is only accessible to the model output. Recent studies have devised effective black-box attacks with high query efficiency. However, such performance is often accompanied by compromises in attack imperceptibility, hindering the practical use of these approaches. In this paper, we propose to restrict the perturbations to a small salient region to generate adversarial examples that can hardly be perceived. This approach is readily compatible with many existing black-box attacks and can significantly improve their imperceptibility with little degradation in attack success rate. Further, we propose the Saliency Attack, a new black-box attack aiming to refine the perturbations in the salient region to achieve even better imperceptibility. Extensive experiments show that compared to the state-of-the-art black-box attacks, our approach achieves much better imperceptibility scores, including most apparent distortion (MAD), $L_0$ and $L_2$ distances, and also obtains significantly higher success rates judged by a human-like threshold on MAD. Importantly, the perturbations generated by our approach are interpretable to some extent. Finally, it is also demonstrated to be robust to different detection-based defenses.
Grassland restoration is a critical means to safeguard grassland ecological degradation. This work considers the maximization restoration areas problem for UAV-enabled grassland restoration method in a restoration process, which is a precise restoration scheme. We first formulate the problem as a multivariable combinatorial optimization problem. By analyzing the characteristics of the optimization problem, it can be decomposed into the two stages: UAV trajectory design and restoration areas allocation. On this basis, the problem can be regarded as a composite problem of traveling salesman problem (TSP) and multidimensional knapsack problem (MKP). Unlike the single combinatorial optimization problem, the coupling relationship between them makes the problem difficult to directly solve by employing the single stage traditional methods. To effectively solve without ignoring the dependence between the two stages, we develop a cooperative optimization algorithm based on heuristic algorithm and population-based incremental learning (PBIL) incorporated with a maximum-residual-energy-based local search (MRELS) strategy, called CHAPBILM, to deal with this problem under constraint conditions of the UAV energy, the total seeds weight, and the number of restored areas. The simulation studies demonstrate that the proposed cooperative optimization method is effective for UAV-enabled grassland restoration problem and can significantly outperform the noncooperative optimization methods, which also verifies the dependency relationship between UAV trajectory design and restoration areas allocation.
Quantizing deep neural networks (DNNs) has been a promising solution for deploying deep neural networks on embedded devices. However, most of the existing methods do not quantize gradients, and the process of quantizing DNNs still has a lot of floating-point operations, which hinders the further applications of quantized DNNs. To solve this problem, we propose a new heuristic method based on cooperative coevolution for quantizing DNNs. Under the framework of cooperative coevolution, we use the estimation of distribution algorithm to search for the low-bits weights. Specifically, we first construct an initial quantized network from a pre-trained network instead of random initialization and then start searching from it by restricting the search space. So far, the problem is the largest discrete problem known to be solved by evolutionary algorithms. Experiments show that our method can train 4 bit ResNet-20 on the Cifar-10 dataset without sacrificing accuracy.