Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities. In this paper we consider the problem of finding optimal policies assuming that they belong to a reproducing kernel Hilbert space (RKHS). To that end we compute unbiased stochastic gradients of the value function which we use as ascent directions to update the policy. A major drawback of policy gradient-type algorithms is that they are limited to episodic tasks unless stationarity assumptions are imposed. Hence preventing these algorithms to be fully implemented online, which is a desirable property for systems that need to adapt to new tasks and/or environments in deployment. The main requirement for a policy gradient algorithm to work is that the estimate of the gradient at any point in time is an ascent direction for the initial value function. In this work we establish that indeed this is the case which enables to show the convergence of the online algorithm to the critical points of the initial value function. A numerical example shows the ability of our online algorithm to learn to solve a navigation and surveillance problem, in which an agent must loop between to goal locations. This example corroborates our theoretical findings about the ascent directions of subsequent stochastic gradients. It also shows how the agent running our online algorithm succeeds in learning to navigate, following a continuing cyclic trajectory that does not comply with the standard stationarity assumptions in the literature for non episodic training.
Spherical signals are useful mathematical models for data arising in many 3-D applications such as LIDAR images, panorama cameras, and optical scanners. Successful processing of spherical signals entails architectures capable of exploiting their inherent data structure. In particular, spherical convolutional neural networks (Spherical CNNs) have shown promising performance in shape analysis and object recognition. In this paper, we focus on analyzing the properties that Spherical CNNs exhibit as they pertain to the rotational structure present in spherical signals. More specifically, we prove that they are equivariant to rotations and stable to rotation diffeomorphisms. These two properties illustrate how Spherical CNNs exploit the rotational structure of spherical signals, thus offering good generalization and faster learning. We corroborate these properties through controlled numerical experiments.
In this work we study the stability of algebraic neural networks (AlgNNs) with commutative algebras which unify CNNs and GNNs under the umbrella of algebraic signal processing. An AlgNN is a stacked layered structure where each layer is conformed by an algebra $\mathcal{A}$, a vector space $\mathcal{M}$ and a homomorphism $\rho:\mathcal{A}\rightarrow\text{End}(\mathcal{M})$, where $\text{End}(\mathcal{M})$ is the set of endomorphims of $\mathcal{M}$. Signals in each layer are modeled as elements of $\mathcal{M}$ and are processed by elements of $\text{End}(\mathcal{M})$ defined according to the structure of $\mathcal{A}$ via $\rho$. This framework provides a general scenario that covers several types of neural network architectures where formal convolution operators are being used. We obtain stability conditions regarding to perturbations which are defined as distortions of $\rho$, reaching general results whose particular cases are consistent with recent findings in the literature for CNNs and GNNs. We consider conditions on the domain of the homomorphisms in the algebra that lead to stable operators. Interestingly, we found that these conditions are related to the uniform boundedness of the Fr\'echet derivative of a function $p:\text{End}(\mathcal{M})\rightarrow\text{End}(\mathcal{M})$ that maps the images of the generators of $\mathcal{A}$ on $\text{End}(\mathcal{M})$ into a power series representation that defines the filtering of elements in $\mathcal{M}$. Additionally, our results show that stability is universal to convolutional architectures whose algebraic signal model uses the same algebra.
In this work we study the stability of algebraic neural networks (AlgNNs) with commutative algebras which unify CNNs and GNNs under the umbrella of algebraic signal processing. An AlgNN is a stacked layered structure where each layer is conformed by an algebra $\mathcal{A}$, a vector space $\mathcal{M}$ and a homomorphism $\rho:\mathcal{A}\rightarrow\text{End}(\mathcal{M})$, where $\text{End}(\mathcal{M})$ is the set of endomorphims of $\mathcal{M}$. Signals in each layer are modeled as elements of $\mathcal{M}$ and are processed by elements of $\text{End}(\mathcal{M})$ defined according to the structure of $\mathcal{A}$ via $\rho$. This framework provides a general scenario that covers several types of neural network architectures where formal convolution operators are being used. We obtain stability conditions regarding to perturbations which are defined as distortions of $\rho$, reaching general results whose particular cases are consistent with recent findings in the literature for CNNs and GNNs. We consider conditions on the domain of the homomorphisms in the algebra that lead to stable operators. Interestingly, we found that these conditions are related to the uniform boundedness of the Fr\'echet derivative of a function $p:\text{End}(\mathcal{M})\rightarrow\text{End}(\mathcal{M})$ that maps the images of the generators of $\mathcal{A}$ on $\text{End}(\mathcal{M})$ into a power series representation that defines the filtering of elements in $\mathcal{M}$. Additionally, our results show that stability is universal to convolutional architectures whose algebraic signal model uses the same algebra.
Graph Neural Networks (GNNs) are information processing architectures for signals supported on graphs. They are presented here as generalizations of convolutional neural networks (CNNs) in which individual layers contain banks of graph convolutional filters instead of banks of classical convolutional filters. Otherwise, GNNs operate as CNNs. Filters are composed with pointwise nonlinearities and stacked in layers. It is shown that GNN architectures exhibit equivariance to permutation and stability to graph deformations. These properties provide a measure of explanation respecting the good performance of GNNs that can be observed empirically. It is also shown that if graphs converge to a limit object, a graphon, GNNs converge to a corresponding limit object, a graphon neural network. This convergence justifies the transferability of GNNs across networks with different number of nodes.
This paper investigates the general problem of resource allocation for mitigating channel fading effects in Free Space Optical (FSO) networks. The resource allocation problem is modelled with a constrained stochastic optimization framework, which we exemplify with problems in power adaptation and relay selection. Under this framework, we develop two algorithms to solve FSO resource allocation problems. We first present the Stochastic Dual Gradient algorithm that solves the problem exactly by exploiting the null duality gap but whose implementation necessarily requires explicit and accurate system models. As an alternative we present the Primal-Dual Deep Learning algorithm, which parametrizes the resource allocation policy with Deep Neural Networks (DNNs) and optimizes via a primal-dual method. The parametrized resource allocation problem incurs only a small loss of optimality due to the strong representational power of DNNs, and can be moreover implemented in an unsupervised manner without knowledge of system models. Numerical experiments are performed to exhibit superior performance of proposed algorithms compared to baseline methods in a variety of resource allocation problems in FSO networks, including both continuous power allocation and binary relay selection.
In this paper, we consider the problem of computing the barycenter of a set of probability distributions under the Sinkhorn divergence. This problem has recently found applications across various domains, including graphics, learning, and vision, as it provides a meaningful mechanism to aggregate knowledge. Unlike previous approaches which directly operate in the space of probability measures, we recast the Sinkhorn barycenter problem as an instance of unconstrained functional optimization and develop a novel functional gradient descent method named Sinkhorn Descent (SD). We prove that SD converges to a stationary point at a sublinear rate, and under reasonable assumptions, we further show that it asymptotically finds a global minimizer of the Sinkhorn barycenter problem. Moreover, by providing a mean-field analysis, we show that SD preserves the weak convergence of empirical measures. Importantly, the computational complexity of SD scales linearly in the dimension $d$ and we demonstrate its scalability by solving a $100$-dimensional Sinkhorn barycenter problem.
Deterministic Policy Gradient (DPG) removes a level of randomness from standard randomized-action Policy Gradient (PG), and demonstrates substantial empirical success for tackling complex dynamic problems involving Markov decision processes. At the same time, though, DPG loses its ability to learn in a model-free (i.e., actor-only) fashion, frequently necessitating the use of critics in order to obtain consistent estimates of the associated policy-reward gradient. In this work, we introduce Zeroth-order Deterministic Policy Gradient (ZDPG), which approximates policy-reward gradients via two-point stochastic evaluations of the $Q$-function, constructed by properly designed low-dimensional action-space perturbations. Exploiting the idea of random horizon rollouts for obtaining unbiased estimates of the $Q$-function, ZDPG lifts the dependence on critics and restores true model-free policy learning, while enjoying built-in and provable algorithmic stability. Additionally, we present new finite sample complexity bounds for ZDPG, which improve upon existing results by up to two orders of magnitude. Our findings are supported by several numerical experiments, which showcase the effectiveness of ZDPG in a practical setting, and its advantages over both PG and Baseline PG.
Stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, and forms the bedrock of modern machine learning and statistics. In this work, we seek to balance the fact that attenuating step-size is required for exact asymptotic convergence with the fact that constant step-size learns faster in finite time up to an error. To do so, rather than fixing the mini-batch and the step-size at the outset, we propose a strategy to allow parameters to evolve adaptively. Specifically, the batch-size is set to be a piecewise-constant increasing sequence where the increase occurs when a suitable error criterion is satisfied. Moreover, the step-size is selected as that which yields the fastest convergence. The overall algorithm, two scale adaptive (TSA) scheme, is developed for both convex and non-convex stochastic optimization problems. It inherits the exact asymptotic convergence of stochastic gradient method. More importantly, the optimal error decreasing rate is achieved theoretically, as well as an overall reduction in computational cost. Experimentally, we observe that TSA attains a favorable tradeoff relative to standard SGD that fixes the mini-batch and the step-size, or simply allowing one to increase or decrease respectively.