Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Donghwan Lee

Suppressing Overestimation in Q-Learning through Adversarial Behaviors

Oct 10, 2023

HyeAnn Lee, Donghwan Lee

Figure 1 for Suppressing Overestimation in Q-Learning through Adversarial Behaviors

Figure 2 for Suppressing Overestimation in Q-Learning through Adversarial Behaviors

Figure 3 for Suppressing Overestimation in Q-Learning through Adversarial Behaviors

Figure 4 for Suppressing Overestimation in Q-Learning through Adversarial Behaviors

Abstract:The goal of this paper is to propose a new Q-learning algorithm with a dummy adversarial player, which is called dummy adversarial Q-learning (DAQ), that can effectively regulate the overestimation bias in standard Q-learning. With the dummy player, the learning can be formulated as a two-player zero-sum game. The proposed DAQ unifies several Q-learning variations to control overestimation biases, such as maxmin Q-learning and minmax Q-learning (proposed in this paper) in a single framework. The proposed DAQ is a simple but effective way to suppress the overestimation bias thourgh dummy adversarial behaviors and can be easily applied to off-the-shelf reinforcement learning algorithms to improve the performances. A finite-time convergence of DAQ is analyzed from an integrated perspective by adapting an adversarial Q-learning. The performance of the suggested DAQ is empirically demonstrated under various benchmark environments.

Via

Access Paper or Ask Questions

A primal-dual perspective for distributed TD-learning

Oct 01, 2023

Han-Dong Lim, Donghwan Lee

Abstract:The goal of this paper is to investigate distributed temporal difference (TD) learning for a networked multi-agent Markov decision process. The proposed approach is based on distributed optimization algorithms, which can be interpreted as primal-dual Ordinary differential equation (ODE) dynamics subject to null-space constraints. Based on the exponential convergence behavior of the primal-dual ODE dynamics subject to null-space constraints, we examine the behavior of the final iterate in various distributed TD-learning scenarios, considering both constant and diminishing step-sizes and incorporating both i.i.d. and Markovian observation models. Unlike existing methods, the proposed algorithm does not require the assumption that the underlying communication network structure is characterized by a doubly stochastic matrix.

Via

Access Paper or Ask Questions

On the Local Quadratic Stability of T-S Fuzzy Systems in the Vicinity of the Origin

Sep 14, 2023

Donghwan Lee, Do Wan Kim

Abstract:The main goal of this paper is to introduce new local stability conditions for continuous-time Takagi-Sugeno (T-S) fuzzy systems. These stability conditions are based on linear matrix inequalities (LMIs) in combination with quadratic Lyapunov functions. Moreover, they integrate information on the membership functions at the origin and effectively leverage the linear structure of the underlying nonlinear system in the vicinity of the origin. As a result, the proposed conditions are proved to be less conservative compared to existing methods using fuzzy Lyapunov functions in the literature. Moreover, we establish that the proposed methods offer necessary and sufficient conditions for the local exponential stability of T-S fuzzy systems. The paper also includes discussions on the inherent limitations associated with fuzzy Lyapunov approaches. To demonstrate the theoretical results, we provide comprehensive examples that elucidate the core concepts and validate the efficacy of the proposed conditions.

Via

Access Paper or Ask Questions

An O.D.E. Framework of Distributed TD-Learning for Networked Multi-Agent Markov Decision Processes

Aug 17, 2023

Donghwan Lee, Han-Dong Lim, Do Wan Kim

Figure 1 for An O.D.E. Framework of Distributed TD-Learning for Networked Multi-Agent Markov Decision Processes

Figure 2 for An O.D.E. Framework of Distributed TD-Learning for Networked Multi-Agent Markov Decision Processes

Figure 3 for An O.D.E. Framework of Distributed TD-Learning for Networked Multi-Agent Markov Decision Processes

Figure 4 for An O.D.E. Framework of Distributed TD-Learning for Networked Multi-Agent Markov Decision Processes

Abstract:The primary objective of this paper is to investigate distributed ordinary differential equation (ODE) and distributed temporal difference (TD) learning algorithms for networked multi-agent Markov decision problems (MAMDPs). In our study, we adopt a distributed multi-agent framework where individual agents have access only to their own rewards, lacking insights into the rewards of other agents. Additionally, each agent has the ability to share its parameters with neighboring agents through a communication network, represented by a graph. Our contributions can be summarized in two key points: 1) We introduce novel distributed ODEs, inspired by the averaging consensus method in the continuous-time domain. The convergence of the ODEs is assessed through control theory perspectives. 2) Building upon the aforementioned ODEs, we devise new distributed TD-learning algorithms. A standout feature of one of our proposed distributed ODEs is its incorporation of two independent dynamic systems, each with a distinct role. This characteristic sets the stage for a novel distributed TD-learning strategy, the convergence of which can potentially be established using Borkar-Meyn theorem.

Via

Access Paper or Ask Questions

Temporal Difference Learning with Experience Replay

Jun 16, 2023

Han-Dong Lim, Donghwan Lee

Figure 1 for Temporal Difference Learning with Experience Replay

Figure 2 for Temporal Difference Learning with Experience Replay

Figure 3 for Temporal Difference Learning with Experience Replay

Abstract:Temporal-difference (TD) learning is widely regarded as one of the most popular algorithms in reinforcement learning (RL). Despite its widespread use, it has only been recently that researchers have begun to actively study its finite time behavior, including the finite time bound on mean squared error and sample complexity. On the empirical side, experience replay has been a key ingredient in the success of deep RL algorithms, but its theoretical effects on RL have yet to be fully understood. In this paper, we present a simple decomposition of the Markovian noise terms and provide finite-time error bounds for TD-learning with experience replay. Specifically, under the Markovian observation model, we demonstrate that for both the averaged iterate and final iterate cases, the error term induced by a constant step-size can be effectively controlled by the size of the replay buffer and the mini-batch sampled from the experience replay buffer.

Via

Access Paper or Ask Questions

Finite-Time Analysis of Minimax Q-Learning for Two-Player Zero-Sum Markov Games: Switching System Approach

Jun 12, 2023

Donghwan Lee

Abstract:The objective of this paper is to investigate the finite-time analysis of a Q-learning algorithm applied to two-player zero-sum Markov games. Specifically, we establish a finite-time analysis of both the minimax Q-learning algorithm and the corresponding value iteration method. To enhance the analysis of both value iteration and Q-learning, we employ the switching system model of minimax Q-learning and the associated value iteration. This approach provides further insights into minimax Q-learning and facilitates a more straightforward and insightful convergence analysis. We anticipate that the introduction of these additional insights has the potential to uncover novel connections and foster collaboration between concepts in the fields of control theory and reinforcement learning communities.

* arXiv admin note: text overlap with arXiv:2205.05455

Via

Access Paper or Ask Questions

Optimal Heterogeneous Collaborative Linear Regression and Contextual Bandits

Jun 09, 2023

Xinmeng Huang, Kan Xu, Donghwan Lee, Hamed Hassani, Hamsa Bastani, Edgar Dobriban

Figure 1 for Optimal Heterogeneous Collaborative Linear Regression and Contextual Bandits

Figure 2 for Optimal Heterogeneous Collaborative Linear Regression and Contextual Bandits

Figure 3 for Optimal Heterogeneous Collaborative Linear Regression and Contextual Bandits

Figure 4 for Optimal Heterogeneous Collaborative Linear Regression and Contextual Bandits

Abstract:Large and complex datasets are often collected from several, possibly heterogeneous sources. Collaborative learning methods improve efficiency by leveraging commonalities across datasets while accounting for possible differences among them. Here we study collaborative linear regression and contextual bandits, where each instance's associated parameters are equal to a global parameter plus a sparse instance-specific term. We propose a novel two-stage estimator called MOLAR that leverages this structure by first constructing an entry-wise median of the instances' linear regression estimates, and then shrinking the instance-specific estimates towards the median. MOLAR improves the dependence of the estimation error on the data dimension, compared to independent least squares estimates. We then apply MOLAR to develop methods for sparsely heterogeneous collaborative contextual bandits, which lead to improved regret guarantees compared to independent bandit methods. We further show that our methods are minimax optimal by providing a number of lower bounds. Finally, we support the efficiency of our methods by performing experiments on both synthetic data and the PISA dataset on student educational outcomes from heterogeneous countries.

Via

Access Paper or Ask Questions

TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering

Mar 27, 2023

Jaehoon Choi, Dongki Jung, Taejae Lee, Sangwook Kim, Youngdong Jung, Dinesh Manocha, Donghwan Lee

Figure 1 for TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering

Figure 2 for TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering

Figure 3 for TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering

Figure 4 for TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering

Abstract:We present a new pipeline for acquiring a textured mesh in the wild with a single smartphone which offers access to images, depth maps, and valid poses. Our method first introduces an RGBD-aided structure from motion, which can yield filtered depth maps and refines camera poses guided by corresponding depth. Then, we adopt the neural implicit surface reconstruction method, which allows for high-quality mesh and develops a new training process for applying a regularization provided by classical multi-view stereo methods. Moreover, we apply a differentiable rendering to fine-tune incomplete texture maps and generate textures which are perceptually closer to the original scene. Our pipeline can be applied to any common objects in the real world without the need for either in-the-lab environments or accurate mask images. We demonstrate results of captured objects with complex shapes and validate our method numerically against existing 3D reconstruction and texture mapping methods.

* Accepted to CVPR23. Project Page: https://jh-choi.github.io/TMO/

Via

Access Paper or Ask Questions

Backstepping Temporal Difference Learning

Feb 28, 2023

Han-Dong Lim, Donghwan Lee

Figure 1 for Backstepping Temporal Difference Learning

Figure 2 for Backstepping Temporal Difference Learning

Figure 3 for Backstepping Temporal Difference Learning

Figure 4 for Backstepping Temporal Difference Learning

Abstract:Off-policy learning ability is an important feature of reinforcement learning (RL) for practical applications. However, even one of the most elementary RL algorithms, temporal-difference (TD) learning, is known to suffer form divergence issue when the off-policy scheme is used together with linear function approximation. To overcome the divergent behavior, several off-policy TD-learning algorithms, including gradient-TD learning (GTD), and TD-learning with correction (TDC), have been developed until now. In this work, we provide a unified view of such algorithms from a purely control-theoretic perspective, and propose a new convergent algorithm. Our method relies on the backstepping technique, which is widely used in nonlinear control theory. Finally, convergence of the proposed algorithm is experimentally verified in environments where the standard TD-learning is known to be unstable.

Via

Access Paper or Ask Questions

Demystifying Disagreement-on-the-Line in High Dimensions

Jan 31, 2023

Donghwan Lee, Behrad Moniri, Xinmeng Huang, Edgar Dobriban, Hamed Hassani

Figure 1 for Demystifying Disagreement-on-the-Line in High Dimensions

Figure 2 for Demystifying Disagreement-on-the-Line in High Dimensions

Figure 3 for Demystifying Disagreement-on-the-Line in High Dimensions

Figure 4 for Demystifying Disagreement-on-the-Line in High Dimensions

Abstract:Evaluating the performance of machine learning models under distribution shift is challenging, especially when we only have unlabeled data from the shifted (target) domain, along with labeled data from the original (source) domain. Recent work suggests that the notion of disagreement, the degree to which two models trained with different randomness differ on the same input, is a key to tackle this problem. Experimentally, disagreement and prediction error have been shown to be strongly connected, which has been used to estimate model performance. Experiments have lead to the discovery of the disagreement-on-the-line phenomenon, whereby the classification error under the target domain is often a linear function of the classification error under the source domain; and whenever this property holds, disagreement under the source and target domain follow the same linear relation. In this work, we develop a theoretical foundation for analyzing disagreement in high-dimensional random features regression; and study under what conditions the disagreement-on-the-line phenomenon occurs in our setting. Experiments on CIFAR-10-C, Tiny ImageNet-C, and Camelyon17 are consistent with our theory and support the universality of the theoretical findings.

Via

Access Paper or Ask Questions