This document describes the generalized moving peaks benchmark (GMPB) and how it can be used to generate problem instances for continuous large-scale dynamic optimization problems. It presents a set of 15 benchmark problems, the relevant source code, and a performance indicator, designed for comparative studies and competitions in large-scale dynamic optimization. Although its primary purpose is to provide a coherent basis for running competitions, its generality allows the interested reader to use this document as a guide to design customized problem instances to investigate issues beyond the scope of the presented benchmark suite. To this end, we explain the modular structure of the GMPB and how its constituents can be assembled to form problem instances with a variety of controllable characteristics ranging from unimodal to highly multimodal, symmetric to highly asymmetric, smooth to highly irregular, and various degrees of variable interaction and ill-conditioning.
This paper proposes Differential-Critic Generative Adversarial Network (DiCGAN) to learn the distribution of user-desired data when only partial instead of the entire dataset possesses the desired property, which generates desired data that meets user's expectations and can assist in designing biological products with desired properties. Existing approaches select the desired samples first and train regular GANs on the selected samples to derive the user-desired data distribution. However, the selection of the desired data relies on an expert criterion and supervision over the entire dataset. DiCGAN introduces a differential critic that can learn the preference direction from the pairwise preferences, which is amateur knowledge and can be defined on part of the training data. The resultant critic guides the generation of the desired data instead of the whole data. Specifically, apart from the Wasserstein GAN loss, a ranking loss of the pairwise preferences is defined over the critic. It endows the difference of critic values between each pair of samples with the pairwise preference relation. The higher critic value indicates that the sample is preferred by the user. Thus training the generative model for higher critic values encourages the generation of user-preferred samples. Extensive experiments show that our DiCGAN achieves state-of-the-art performance in learning the user-desired data distributions, especially in the cases of insufficient desired data and limited supervision.
Gridless methods show great superiority in line spectral estimation. These methods need to solve an atomic $l_0$ norm (i.e., the continuous analog of $l_0$ norm) minimization problem to estimate frequencies and model order. Since this problem is NP-hard to compute, relaxations of atomic $l_0$ norm, such as nuclear norm and reweighted atomic norm, have been employed for promoting sparsity. However, the relaxations give rise to a resolution limit, subsequently leading to biased model order and convergence error. To overcome the above shortcomings of relaxation, we propose a novel idea of simultaneously estimating the frequencies and model order by means of the atomic $l_0$ norm. To accomplish this idea, we build a multiobjective optimization model. The measurment error and the atomic $l_0$ norm are taken as the two optimization objectives. The proposed model directly exploits the model order via the atomic $l_0$ norm, thus breaking the resolution limit. We further design a variable-length evolutionary algorithm to solve the proposed model, which includes two innovations. One is a variable-length coding and search strategy. It flexibly codes and interactively searches diverse solutions with different model orders. These solutions act as steppingstones that help fully exploring the variable and open-ended frequency search space and provide extensive potentials towards the optima. Another innovation is a model order pruning mechanism, which heuristically prunes less contributive frequencies within the solutions, thus significantly enhancing convergence and diversity. Simulation results confirm the superiority of our approach in both frequency estimation and model order selection.
The source number identification is an essential step in direction-of-arrival (DOA) estimation. Existing methods may provide a wrong source number due to inferior statistical properties (in low SNR or limited snapshots) or modeling errors (caused by relaxing sparse penalties), especially in impulsive noise. To address this issue, we propose a novel idea of simultaneous source number identification and DOA estimation. We formulate a multiobjective off-grid DOA estimation model to realize this idea, by which the source number can be automatically identified together with DOA estimation. In particular, the source number is properly exploited by the $l_0$ norm of impinging signals without relaxations, guaranteeing accuracy. Furthermore, we design a multiobjective bilevel evolutionary algorithm to solve the proposed model. The source number identification and sparse recovery are simultaneously optimized at the on-grid (lower) level. A forward search strategy is developed to further refine the grid at the off-grid (upper) level. This strategy does not need linear approximations and can eliminate the off-grid gap with low computational complexity. Simulation results demonstrate the outperformance of our method in terms of source number and root mean square error.
This document describes the Generalized Moving Peaks Benchmark (GMPB) that generates continuous dynamic optimization problem instances. The landscapes generated by GMPB are constructed by assembling several components with a variety of controllable characteristics ranging from unimodal to highly multimodal, symmetric to highly asymmetric, smooth to highly irregular, and various degrees of variable interaction and ill-conditioning. In this document, we explain how these characteristics can be generated by different parameter settings of GMPB. The MATLAB source code of GMPB is also explained. This document forms the basis for a range of competitions on Evolutionary Continuous Dynamic Optimization in the upcoming well-known conferences.
The performance of deep reinforcement learning methods prone to degenerate when applied to environments with non-stationary dynamics. In this paper, we utilize the latent context recurrent encoders motivated by recent Meta-RL materials, and propose the Latent Context-based Soft Actor Critic (LC-SAC) method to address aforementioned issues. By minimizing the contrastive prediction loss function, the learned context variables capture the information of the environment dynamics and the recent behavior of the agent. Then combined with the soft policy iteration paradigm, the LC-SAC method alternates between soft policy evaluation and soft policy improvement until it converges to the optimal policy. Experimental results show that the performance of LC-SAC is significantly better than the SAC algorithm on the MetaWorld ML1 tasks whose dynamics changes drasticly among different episodes, and is comparable to SAC on the continuous control benchmark task MuJoCo whose dynamics changes slowly or doesn't change between different episodes. In addition, we also conduct relevant experiments to determine the impact of different hyperparameter settings on the performance of the LC-SAC algorithm and give the reasonable suggestions of hyperparameter setting.
Deep reinforcement learning methods have shown great performance on many challenging cooperative multi-agent tasks. Two main promising research directions are multi-agent value function decomposition and multi-agent policy gradients. In this paper, we propose a new decomposed multi-agent soft actor-critic (mSAC) method, which effectively combines the advantages of the aforementioned two methods. The main modules include decomposed Q network architecture, discrete probabilistic policy and counterfactual advantage function (optinal). Theoretically, mSAC supports efficient off-policy learning and addresses credit assignment problem partially in both discrete and continuous action spaces. Tested on StarCraft II micromanagement cooperative multiagent benchmark, we empirically investigate the performance of mSAC against its variants and analyze the effects of the different components. Experimental results demonstrate that mSAC significantly outperforms policy-based approach COMA, and achieves competitive results with SOTA value-based approach Qmix on most tasks in terms of asymptotic perfomance metric. In addition, mSAC achieves pretty good results on large action space tasks, such as 2c_vs_64zg and MMM2.
Many real-world applications have the time-linkage property, and the only theoretical analysis is recently given by Zheng, et al. (TEVC 2021) on their proposed time-linkage OneMax problem, OneMax$_{(0,1^n)}$. However, only two elitist algorithms (1+1)EA and ($\mu$+1)EA are analyzed, and it is unknown whether the non-elitism mechanism could help to escape the local optima existed in OneMax$_{(0,1^n)}$. In general, there are few theoretical results on the benefits of the non-elitism in evolutionary algorithms. In this work, we analyze on the influence of the non-elitism via comparing the performance of the elitist (1+$\lambda$)EA and its non-elitist counterpart (1,$\lambda$)EA. We prove that with probability $1-o(1)$ (1+$\lambda$)EA will get stuck in the local optima and cannot find the global optimum, but with probability $1$, (1,$\lambda$)EA can reach the global optimum and its expected runtime is $O(n^{3+c}\log n)$ with $\lambda=c \log_{\frac{e}{e-1}} n$ for the constant $c\ge 1$. Noting that a smaller offspring size is helpful for escaping from the local optima, we further resort to the compact genetic algorithm where only two individuals are sampled to update the probabilistic model, and prove its expected runtime of $O(n^3\log n)$. Our computational experiments also verify the efficiency of the two non-elitist algorithms.