Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sridhar Mahadevan

Universal Decision Models

Oct 28, 2021
Sridhar Mahadevan

Humans are universal decision makers: we reason causally to understand the world; we act competitively to gain advantage in commerce, games, and war; and we are able to learn to make better decisions through trial and error. In this paper, we propose Universal Decision Model (UDM), a mathematical formalism based on category theory. Decision objects in a UDM correspond to instances of decision tasks, ranging from causal models and dynamical systems such as Markov decision processes and predictive state representations, to network multiplayer games and Witsenhausen's intrinsic models, which generalizes all these previous formalisms. A UDM is a category of objects, which include decision objects, observation objects, and solution objects. Bisimulation morphisms map between decision objects that capture structure-preserving abstractions. We formulate universal properties of UDMs, including information integration, decision solvability, and hierarchical abstraction. We describe universal functorial representations of UDMs, and propose an algorithm for computing the minimal object in a UDM using algebraic topology. We sketch out an application of UDMs to causal inference in network economics, using a complex multiplayer producer-consumer two-sided marketplace.

Via

Access Paper or Ask Questions

Causal Inference in Network Economics

Sep 20, 2021
Sridhar Mahadevan

Figure 1 for Causal Inference in Network Economics

Figure 2 for Causal Inference in Network Economics

Figure 3 for Causal Inference in Network Economics

Figure 4 for Causal Inference in Network Economics

Network economics is the study of a rich class of equilibrium problems that occur in the real world, from traffic management to supply chains and two-sided online marketplaces. In this paper we explore causal inference in network economics, building on the mathematical framework of variational inequalities, which is a generalization of classical optimization. Our framework can be viewed as a synthesis of the well-known variational inequality formalism with the broad principles of causal inference

* 12 pages

Via

Access Paper or Ask Questions

Asymptotic Causal Inference

Sep 20, 2021
Sridhar Mahadevan

Figure 1 for Asymptotic Causal Inference

Figure 2 for Asymptotic Causal Inference

Figure 3 for Asymptotic Causal Inference

Figure 4 for Asymptotic Causal Inference

We investigate causal inference in the asymptotic regime as the number of variables approaches infinity using an information-theoretic framework. We define structural entropy of a causal model in terms of its description complexity measured by the logarithmic growth rate, measured in bits, of all directed acyclic graphs (DAGs), parameterized by the edge density d. Structural entropy yields non-intuitive predictions. If we randomly sample a DAG from the space of all models, in the range d = (0, 1/8), almost surely the model is a two-layer DAG! Semantic entropy quantifies the reduction in entropy where edges are removed by causal intervention. Semantic causal entropy is defined as the f-divergence between the observational distribution and the interventional distribution P', where a subset S of edges are intervened on to determine their causal influence. We compare the decomposability properties of semantic entropy for different choices of f-divergences, including KL-divergence, squared Hellinger distance, and total variation distance. We apply our framework to generalize a recently popular bipartite experimental design for studying causal inference on large datasets, where interventions are carried out on one set of variables (e.g., power plants, items in an online store), but outcomes are measured on a disjoint set of variables (residents near power plants, or shoppers). We generalize bipartite designs to k-partite designs, and describe an optimization framework for finding the optimal k-level DAG architecture for any value of d \in (0, 1/2). As edge density increases, a sequence of phase transitions occur over disjoint intervals of d, with deeper DAG architectures emerging for larger values of d. We also give a quantitative bound on the number of samples needed to reliably test for average causal influence for a k-partite design.

* 16 pages

Via

Access Paper or Ask Questions

Multiscale Manifold Warping

Sep 19, 2021
Sridhar Mahadevan, Anup Rao, Georgios Theocharous, Jennifer Healey

Figure 1 for Multiscale Manifold Warping

Figure 2 for Multiscale Manifold Warping

Figure 3 for Multiscale Manifold Warping

Figure 4 for Multiscale Manifold Warping

Many real-world applications require aligning two temporal sequences, including bioinformatics, handwriting recognition, activity recognition, and human-robot coordination. Dynamic Time Warping (DTW) is a popular alignment method, but can fail on high-dimensional real-world data where the dimensions of aligned sequences are often unequal. In this paper, we show that exploiting the multiscale manifold latent structure of real-world data can yield improved alignment. We introduce a novel framework called Warping on Wavelets (WOW) that integrates DTW with a a multi-scale manifold learning framework called Diffusion Wavelets. We present a theoretical analysis of the WOW family of algorithms and show that it outperforms previous state of the art methods, such as canonical time warping (CTW) and manifold warping, on several real-world datasets.

* 18 pages

Via

Access Paper or Ask Questions

Finite-Sample Analysis of Proximal Gradient TD Algorithms

Jul 03, 2020
Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik

Figure 1 for Finite-Sample Analysis of Proximal Gradient TD Algorithms

Figure 2 for Finite-Sample Analysis of Proximal Gradient TD Algorithms

Figure 3 for Finite-Sample Analysis of Proximal Gradient TD Algorithms

Figure 4 for Finite-Sample Analysis of Proximal Gradient TD Algorithms

In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms. Previous analyses of this class of algorithms use ODE techniques to prove asymptotic convergence, and to the best of our knowledge, no finite-sample analysis has been done. Moreover, there has been not much work on finite-sample analysis for convergent off-policy reinforcement learning algorithms. In this paper, we formulate GTD methods as stochastic gradient algorithms w.r.t.~a primal-dual saddle-point objective function, and then conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Two revised algorithms are also proposed, namely projected GTD2 and GTD2-MP, which offer improved convergence guarantees and acceleration, respectively. The results of our theoretical analysis show that the GTD family of algorithms are indeed comparable to the existing LSTD methods in off-policy learning scenarios.

* 31st Conference on Uncertainty in Artificial Intelligence (UAI). arXiv admin note: substantial text overlap with arXiv:2006.03976

Via

Access Paper or Ask Questions

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

Jun 06, 2020
Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, Ji Liu, Sridhar Mahadevan, Marek Petrik

Figure 1 for Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

Figure 2 for Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

Figure 3 for Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

Figure 4 for Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not by starting from their original objective functions, as previously attempted, but rather from a primal-dual saddle-point objective function. We also conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Previous analyses of this class of algorithms use stochastic approximation techniques to prove asymptotic convergence, and do not provide any finite-sample analysis. We also propose an accelerated algorithm, called GTD2-MP, that uses proximal ``mirror maps'' to yield an improved convergence rate. The results of our theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. We provide experimental results showing the improved performance of our accelerated gradient TD methods.

* Journal of Artificial Intelligence (JAIR)

Via

Access Paper or Ask Questions

Regularized Off-Policy TD-Learning

Jun 06, 2020
Bo Liu, Sridhar Mahadevan, Ji Liu

Figure 1 for Regularized Off-Policy TD-Learning

Figure 2 for Regularized Off-Policy TD-Learning

Figure 3 for Regularized Off-Policy TD-Learning

We present a novel $l_1$ regularized off-policy convergent TD-learning method (termed RO-TD), which is able to learn sparse representations of value functions with low computational complexity. The algorithmic framework underlying RO-TD integrates two key ideas: off-policy convergent gradient TD methods, such as TDC, and a convex-concave saddle-point formulation of non-smooth convex optimization, which enables first-order solvers and feature selection using online convex regularization. A detailed theoretical and experimental analysis of RO-TD is presented. A variety of experiments are presented to illustrate the off-policy convergence, sparse feature selection capability and low computational cost of the RO-TD algorithm.

* 26th Advances in Neural Information Processing Systems (NIPS). arXiv admin note: substantial text overlap with arXiv:1405.6757

Via

Access Paper or Ask Questions

Finite-Sample Analysis of GTD Algorithms

Jun 06, 2020
Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik

Figure 1 for Finite-Sample Analysis of GTD Algorithms

Figure 2 for Finite-Sample Analysis of GTD Algorithms

Figure 3 for Finite-Sample Analysis of GTD Algorithms

Figure 4 for Finite-Sample Analysis of GTD Algorithms

* 31st Conference on Uncertainty in Artificial Intelligence (UAI). arXiv admin note: substantial text overlap with arXiv:2006.03976

Via

Access Paper or Ask Questions

Optimizing for the Future in Non-Stationary MDPs

Jun 02, 2020
Yash Chandak, Georgios Theocharous, Shiv Shankar, Martha White, Sridhar Mahadevan, Philip S. Thomas

Figure 1 for Optimizing for the Future in Non-Stationary MDPs

Figure 2 for Optimizing for the Future in Non-Stationary MDPs

Figure 3 for Optimizing for the Future in Non-Stationary MDPs

Figure 4 for Optimizing for the Future in Non-Stationary MDPs

Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process (MDP) is stationary. However, in many practical real-world applications, this assumption is often violated. We discuss how current methods can have inherent limitations for non-stationary MDPs, and therefore searching for a policy that is good for the future, unknown MDP, requires rethinking the optimization paradigm. To address this problem, we develop a method that builds upon ideas from both counter-factual reasoning and curve-fitting to proactively search for a good future policy, without ever modeling the underlying non-stationarity. Interestingly, we observe that minimizing performance over some of the data from past episodes might be beneficial when searching for a policy that maximizes future performance. The effectiveness of the proposed method is demonstrated on problems motivated by real-world applications.

* Thirty-seventh International Conference on Machine Learning (ICML 2020)

Via

Access Paper or Ask Questions