In the field of reinforcement learning, because of the high cost and risk of policy training in the real world, policies are trained in a simulation environment and transferred to the corresponding real-world environment. However, the simulation environment does not perfectly mimic the real-world environment, lead to model misspecification. Multiple studies report significant deterioration of policy performance in a real-world environment. In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set. The aim is to optimize the worst-case performance on the uncertainty parameter set to guarantee the performance in the corresponding real-world environment. To obtain a policy for the optimization, we propose an off-policy actor-critic approach called the Max-Min Twin Delayed Deep Deterministic Policy Gradient algorithm (M2TD3), which solves a max-min optimization problem using a simultaneous gradient ascent descent approach. Experiments in multi-joint dynamics with contact (MuJoCo) environments show that the proposed method exhibited a worst-case performance superior to several baseline approaches.
Evolution strategy (ES) is one of promising classes of algorithms for black-box continuous optimization. Despite its broad successes in applications, theoretical analysis on the speed of its convergence is limited on convex quadratic functions and their monotonic transformation. In this study, an upper bound and a lower bound of the rate of linear convergence of the (1+1)-ES on locally $L$-strongly convex functions with $U$-Lipschitz continuous gradient are derived as $\exp\left(-\Omega_{d\to\infty}\left(\frac{L}{d\cdot U}\right)\right)$ and $\exp\left(-\frac1d\right)$, respectively. Notably, any prior knowledge on the mathematical properties of the objective function such as Lipschitz constant is not given to the algorithm, whereas the existing analyses of derivative-free optimization algorithms require them.
In real-world applications of multi-class classification models, misclassification in an important class (e.g., stop sign) can be significantly more harmful than in other classes (e.g., speed limit). In this paper, we propose a loss function that can improve the recall of an important class while maintaining the same level of accuracy as the case using cross-entropy loss. For our purpose, we need to make the separation of the important class better than the other classes. However, existing methods that give a class-sensitive penalty for cross-entropy loss do not improve the separation. On the other hand, the method that gives a margin to the angle between the feature vectors and the weight vectors of the last fully connected layer corresponding to each feature can improve the separation. Therefore, we propose a loss function that can improve the separation of the important class by setting the margin only for the important class, called Class-sensitive Additive Angular Margin Loss (CAMRI Loss). CAMRI loss is expected to reduce the variance of angles between features and weights of the important class relative to other classes due to the margin around the important class in the feature space by adding a penalty to the angle. In addition, concentrating the penalty only on the important classes hardly sacrifices the separation of the other classes. Experiments on CIFAR-10, GTSRB, and AwA2 showed that the proposed method could improve up to 9% recall improvement on cross-entropy loss without sacrificing accuracy.
In this study, we investigate the problem of min-max continuous optimization in a black-box setting $\min_{x} \max_{y}f(x,y)$. A popular approach updates $x$ and $y$ simultaneously or alternatingly. However, two major limitations have been reported in existing approaches. (I) As the influence of the interaction term between $x$ and $y$ (e.g., $x^\mathrm{T} B y$) on the Lipschitz smooth and strongly convex-concave function $f$ increases, the approaches converge to an optimal solution at a slower rate. (II) The approaches fail to converge if $f$ is not Lipschitz smooth and strongly convex-concave around the optimal solution. To address these difficulties, we propose minimizing the worst-case objective function $F(x)=\max_{y}f(x,y)$ directly using the covariance matrix adaptation evolution strategy, in which the rankings of solution candidates are approximated by our proposed worst-case ranking approximation (WRA) mechanism. Compared with existing approaches, numerical experiments show two important findings regarding our proposed method. (1) The proposed approach is efficient in terms of $f$-calls on a Lipschitz smooth and strongly convex-concave function with a large interaction term. (2) The proposed approach can converge on functions that are not Lipschitz smooth and strongly convex-concave around the optimal solution, whereas existing approaches fail.
A surrogate function is often employed to reduce the number of objective function evaluations for optimization. However, the effect of using a surrogate model in evolutionary approaches has not been theoretically investigated. This paper theoretically analyzes the information-geometric optimization framework using a surrogate function. The value of the expected objective function under the candidate sampling distribution is used as the measure of progress of the algorithm. We assume that the surrogate function is maintained so that the population version of the Kendall's rank correlation coefficient between the surrogate function and the objective function under the candidate sampling distribution is greater than or equal to a predefined threshold. We prove that information-geometric optimization using such a surrogate function leads to a monotonic decrease in the expected objective function value if the threshold is sufficiently close to one. The acceptable threshold value is analyzed for the case of the information-geometric optimization instantiated with Gaussian distributions, i.e., the rank-$\mu$ update CMA-ES, on a convex quadratic objective function. As an alternative to the Kendall's rank correlation coefficient, we investigate the use of the Pearson correlation coefficient between the weights assigned to candidate solutions based on the objective function and the surrogate function.
This paper proposes a two-phase framework with a B\'{e}zier simplex-based interpolation method (TPB) for computationally expensive multi-objective optimization. The first phase in TPB aims to approximate a few Pareto optimal solutions by optimizing a sequence of single-objective scalar problems. The first phase in TPB can fully exploit a state-of-the-art single-objective derivative-free optimizer. The second phase in TPB utilizes a B\'{e}zier simplex model to interpolate the solutions obtained in the first phase. The second phase in TPB fully exploits the fact that a B\'{e}zier simplex model can approximate the Pareto optimal solution set by exploiting its simplex structure when a given problem is simplicial. We investigate the performance of TPB on the 55 bi-objective BBOB problems. The results show that TPB performs significantly better than HMO-CMA-ES and some state-of-the-art meta-model-based optimizers.
We aim to explain a black-box classifier with the form: `data X is classified as class Y because X \textit{has} A, B and \textit{does not have} C' in which A, B, and C are high-level concepts. The challenge is that we have to discover in an unsupervised manner a set of concepts, i.e., A, B and C, that is useful for the explaining the classifier. We first introduce a structural generative model that is suitable to express and discover such concepts. We then propose a learning process that simultaneously learns the data distribution and encourages certain concepts to have a large causal influence on the classifier output. Our method also allows easy integration of user's prior knowledge to induce high interpretability of concepts. Using multiple datasets, we demonstrate that our method can discover useful binary concepts for explanation.
We propose an approach to saddle point optimization relying only on an oracle that solves a minimization problem approximately. We analyze its convergence property on a strongly convex--concave problem and show its linear convergence toward the global min--max saddle point. Based on the convergence analysis, we propose a heuristic approach to adapt the learning rate for the proposed saddle point optimization approach. The implementation of the proposed approach using the (1+1)-CMA-ES as the minimization oracle, namely Adversarial-CMA-ES, is evaluated on test problems. Numerical evaluation reveals the tightness of the theoretical convergence rate bound as well as the efficiency of the learning rate adaptation mechanism. As an example of real-world applications, it is applied to automatic berthing control problems under model uncertainties, showing its usefulness in obtaining solutions robust under model uncertainties.
Video game level generation based on machine learning (ML), in particular, deep generative models, has attracted attention as a technique to automate level generation. However, applications of existing ML-based level generations are mostly limited to tile-based level representation. When ML techniques are applied to game domains with non-tile-based level representation, such as Angry Birds, where objects in a level are specified by real-valued parameters, ML often fails to generate playable levels. In this study, we develop a deep-generative-model-based level generation for the game domain of Angry Birds. To overcome these drawbacks, we propose a sequential encoding of a level and process it as text data, whereas existing approaches employ a tile-based encoding and process it as an image. Experiments show that the proposed level generator drastically improves the stability and diversity of generated levels compared with existing approaches. We apply latent variable evolution with the proposed generator to control the feature of a generated level computed through an AI agent's play, while keeping the level stable and natural.
A major approach to saddle point optimization $\min_x\max_y f(x, y)$ is a gradient based approach as is popularized by generative adversarial networks (GANs). In contrast, we analyze an alternative approach relying only on an oracle that solves a minimization problem approximately. Our approach locates approximate solutions $x'$ and $y'$ to $\min_{x'}f(x', y)$ and $\max_{y'}f(x, y')$ at a given point $(x, y)$ and updates $(x, y)$ toward these approximate solutions $(x', y')$ with a learning rate $\eta$. On locally strong convex--concave smooth functions, we derive conditions on $\eta$ to exhibit linear convergence to a local saddle point, which reveals a possible shortcoming of recently developed robust adversarial reinforcement learning algorithms. We develop a heuristic approach to adapt $\eta$ derivative-free and implement zero-order and first-order minimization algorithms. Numerical experiments are conducted to show the tightness of the theoretical results as well as the usefulness of the $\eta$ adaptation mechanism.