Abstract:Stochastic Optimal Control Problems (SOCPs) plays a major role in the sequential decision-making challenges. There exist various iterative algorithms, under framework of stochastic maximum principle, that sequentially find the optimal control decision. However, they are based on the adjoint sensitivity analysis that necessitates simulation of an adjoint process, typically a backward stochastic differential equation (SDE) that must simultaneously be adapted to a forward filtration and satisfy a terminal condition, which substantially increases complexity and exacerbates the curse of dimensionality. We instead develop a stochastic maximum principle based on the Malliavin calculus, which enables us to devise an iterative algorithm without need of an adjoint process. Our algorithm however needs the Malliavin derivative that can be efficiently computed based on a forward simulator. Empirical comparisons against standard iterative algorithms demonstrate that our approach alleviates the dimensionality bottleneck while delivering competitive performance on the considered SOCPs.
Abstract:In this paper, we introduce two identities one pertaining to the state space of Poisson Point Processes (PPPs), and the other for the Voronoi tessellations formed by PPPs. Then, we explore several applications of these identities within the context of wireless cellular networks.
Abstract:Many sequential decision-making problems need optimization of different objectives which possibly conflict with each other. The conventional way to deal with a multi-task problem is to establish a scalar objective function based on a linear combination of different objectives. However, for the case of having conflicting objectives with different scales, this method needs a trial-and-error approach to properly find proper weights for the combination. As such, in most cases, this approach cannot guarantee an optimal Pareto solution. In this paper, we develop a single-agent scale-independent multi-objective reinforcement learning on the basis of the Advantage Actor-Critic (A2C) algorithm. A convergence analysis is then done for the devised multi-objective algorithm providing a convergence-in-mean guarantee. We then perform some experiments over a multi-task problem to evaluate the performance of the proposed algorithm. Simulation results show the superiority of developed multi-objective A2C approach against the single-objective algorithm.
Abstract:In this paper, we consider a class of nonlinear constrained optimization problems. We formulate this problem as a time-varying optimization using continuous-time parametric functions and derive a dynamical system for tracking the optimal solution. We then re-parameterize the dynamical system to express it as a linear combination of the parametric functions. Calculus of variations is applied to optimize the parametric functions, such that the optimality distance of the solution is minimized. Accordingly, an iterative dynamic algorithm is devised to find the solution with an efficient convergence rate. We benchmark the performance of the proposed algorithm with the prediction-correction method from the optimality and computational complexity point-of-views.




Abstract:We consider a hybrid delivery scheme for streaming content, combining cache-enabled Orthogonal Multipoint Multicast (OMPMC) and on-demand Single-Point Unicast (SPUC) transmissions for heterogeneous networks. The OMPMC service transmits cached files through the whole network to interested users, and users not being satisfied by this service are assigned to the SPUC service to be individually served. The SPUC fetches the requested files from the core network and unicasts them to UEs using cellular beamforming transmissions. We optimize the delivery scheme to minimize the average resource consumption in the network. We formulate a constrained optimization problem over the cache placement and resource allocation of the OMPMC component, as well as the multi-user beamforming scheme of the SPUC component. We apply a path-following method to find the optimal traffic offloading solution. The solutions portray a contrast between the total amount of consumed resources and service outage probability. Simulation results show that the hybrid scheme provides a better tradeoff between the amount of network-wide consumed resources and the service outage probability, as compared to schemes from the literature.