Kakade's natural policy gradient method has been studied extensively in the last years showing linear convergence with and without regularization. We study another natural gradient method which is based on the Fisher information matrix of the state-action distributions and has received little attention from the theoretical side. Here, the state-action distributions follow the Fisher-Rao gradient flow inside the state-action polytope with respect to a linear potential. Therefore, we study Fisher-Rao gradient flows of linear programs more generally and show linear convergence with a rate that depends on the geometry of the linear program. Equivalently, this yields an estimate on the error induced by entropic regularization of the linear program which improves existing results. We extend these results and show sublinear convergence for perturbed Fisher-Rao gradient flows and natural gradient flows up to an approximation error. In particular, these general results cover the case of state-action natural policy gradients.
We propose energy natural gradient descent, a natural gradient method with respect to a Hessian-induced Riemannian metric as an optimization algorithm for physics-informed neural networks (PINNs) and the deep Ritz method. As a main motivation we show that the update direction in function space resulting from the energy natural gradient corresponds to the Newton direction modulo an orthogonal projection onto the model's tangent space. We demonstrate experimentally that energy natural gradient descent yields highly accurate solutions with errors several orders of magnitude smaller than what is obtained when training PINNs with standard optimizers like gradient descent or Adam, even when those are allowed significantly more computation time.
We study the convergence of several natural policy gradient (NPG) methods in infinite-horizon discounted Markov decision processes with regular policy parametrizations. For a variety of NPGs and reward functions we show that the trajectories in state-action space are solutions of gradient flows with respect to Hessian geometries, based on which we obtain global convergence guarantees and convergence rates. In particular, we show linear convergence for unregularized and regularized NPG flows with the metrics proposed by Kakade and Morimura and co-authors by observing that these arise from the Hessian geometries of conditional entropy and entropy respectively. Further, we obtain sublinear convergence rates for Hessian geometries arising from other convex functions like log-barriers. Finally, we interpret the discrete-time NPG methods with regularized rewards as inexact Newton methods if the NPG is defined with respect to the Hessian geometry of the regularizer. This yields local quadratic convergence rates of these methods for step size equal to the penalization strength.
Monocular camera sensors are vital to intelligent vehicle operation and automated driving assistance and are also heavily employed in traffic control infrastructure. Calibrating the monocular camera, though, is time-consuming and often requires significant manual intervention. In this work, we present an extrinsic camera calibration approach that automatizes the parameter estimation by utilizing semantic segmentation information from images and point clouds. Our approach relies on a coarse initial measurement of the camera pose and builds on lidar sensors mounted on a vehicle with high-precision localization to capture a point cloud of the camera environment. Afterward, a mapping between the camera and world coordinate spaces is obtained by performing a lidar-to-camera registration of the semantically segmented sensor data. We evaluate our method on simulated and real-world data to demonstrate low error measurements in the calibration results. Our approach is suitable for infrastructure sensors as well as vehicle sensors, while it does not require motion of the camera platform.
The natural gradient field is a vector field that lives on a model equipped with a distinguished Riemannian metric, e.g. the Fisher-Rao metric, and represents the direction of steepest ascent of an objective function on the model with respect to this metric. In practice, one tries to obtain the corresponding direction on the parameter space by multiplying the ordinary gradient by the inverse of the Gram matrix associated with the metric. We refer to this vector on the parameter space as the natural parameter gradient. In this paper we study when the pushforward of the natural parameter gradient is equal to the natural gradient. Furthermore we investigate the invariance properties of the natural parameter gradient. Both questions are addressed in an overparametrised setting.
Reliable tracking algorithms are essential for automated driving. However, the existing consistency measures are not sufficient to meet the increasing safety demands in the automotive sector. Therefore, this work presents a novel method for self-assessment of single-object tracking in clutter based on Kalman filtering and subjective logic. A key feature of the approach is that it additionally provides a measure of the collected statistical evidence in its online reliability scores. In this way, various aspects of reliability, such as the correctness of the assumed measurement noise, detection probability, and clutter rate, can be monitored in addition to the overall assessment based on the available evidence. Here, we present a mathematical derivation of the reference distribution used in our self-assessment module for our studied problem. Moreover, we introduce a formula that describes how a threshold should be chosen for the degree of conflict, the subjective logic comparison measure used for the reliability decision making. Our approach is evaluated in a challenging simulation scenario designed to model adverse weather conditions. The simulations show that our method can significantly improve the reliability checking of single-object tracking in clutter in several aspects.
Reward optimization in fully observable Markov decision processes is equivalent to a linear program over the polytope of state-action frequencies. Taking a similar perspective in the case of partially observable Markov decision processes with memoryless stochastic policies, the problem was recently formulated as the optimization of a linear objective subject to polynomial constraints. Based on this we present an approach for Reward Optimization in State-Action space (ROSA). We test this approach experimentally in maze navigation tasks. We find that ROSA is computationally efficient and can yield stability improvements over other existing methods.
We present a method that lowers the dose required for a ptychographic reconstruction by adaptively scanning the specimen, thereby providing the required spatial information redundancy in the regions of highest importance. The proposed method is built upon a deep learning model that is trained by reinforcement learning (RL), using prior knowledge of the specimen structure from training data sets. We show that equivalent low-dose experiments using adaptive scanning outperform conventional ptychography experiments in terms of reconstruction resolution.
Within the field of automated driving, a clear trend in environment perception tends towards more sensors, higher redundancy, and overall increase in computational power. This is mainly driven by the paradigm to perceive the entire environment as best as possible at all times. However, due to the ongoing rise in functional complexity, compromises have to be considered to ensure real-time capabilities of the perception system. In this work, we introduce a concept for situation-aware environment perception to control the resource allocation towards processing relevant areas within the data as well as towards employing only a subset of functional modules for environment perception, if sufficient for the current driving task. Specifically, we propose to evaluate the context of an automated vehicle to derive a multi-layer attention map (MLAM) that defines relevant areas. Using this MLAM, the optimum of active functional modules is dynamically configured and intra-module processing of only relevant data is enforced. We outline the feasibility of application of our concept using real-world data in a straight-forward implementation for our system at hand. While retaining overall functionality, we achieve a reduction of accumulated processing time of 59%.
Motion planning at urban intersections that accounts for the situation context, handles occlusions, and deals with measurement and prediction uncertainty is a major challenge on the way to urban automated driving. In this work, we address this challenge with a sampling-based optimization approach. For this, we formulate an optimal control problem that optimizes for low risk and high passenger comfort. The risk is calculated on the basis of the perception information and the respective uncertainty using a risk model. The risk model combines set-based methods and probabilistic approaches. Thus, the approach provides safety guarantees in a probabilistic sense, while for a vanishing risk, the formal safety guarantees of the set-based methods are inherited. By exploring all available behavior options, our approach solves decision making and longitudinal trajectory planning in one step. The available behavior options are provided by a formal representation of the situation context, which is also used to reduce calculation efforts. Occlusions are resolved using the external perception of infrastructure-mounted sensors. Yet, instead of merging external and ego perception with track-to-track fusion, the information is used in parallel. The motion planning scheme is validated through real-world experiments.