Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ran Tian

Quantifying Agent Interaction in Multi-agent Reinforcement Learning for Cost-efficient Generalization

Oct 11, 2023

Yuxin Chen, Chen Tang, Ran Tian, Chenran Li, Jinning Li, Masayoshi Tomizuka, Wei Zhan

Figure 1 for Quantifying Agent Interaction in Multi-agent Reinforcement Learning for Cost-efficient Generalization

Figure 2 for Quantifying Agent Interaction in Multi-agent Reinforcement Learning for Cost-efficient Generalization

Figure 3 for Quantifying Agent Interaction in Multi-agent Reinforcement Learning for Cost-efficient Generalization

Figure 4 for Quantifying Agent Interaction in Multi-agent Reinforcement Learning for Cost-efficient Generalization

Abstract:Generalization poses a significant challenge in Multi-agent Reinforcement Learning (MARL). The extent to which an agent is influenced by unseen co-players depends on the agent's policy and the specific scenario. A quantitative examination of this relationship sheds light on effectively training agents for diverse scenarios. In this study, we present the Level of Influence (LoI), a metric quantifying the interaction intensity among agents within a given scenario and environment. We observe that, generally, a more diverse set of co-play agents during training enhances the generalization performance of the ego agent; however, this improvement varies across distinct scenarios and environments. LoI proves effective in predicting these improvement disparities within specific scenarios. Furthermore, we introduce a LoI-guided resource allocation method tailored to train a set of policies for diverse scenarios under a constrained budget. Our results demonstrate that strategic resource allocation based on LoI can achieve higher performance than uniform allocation under the same computation budget.

* 12 pages, 6 figures

Via

Access Paper or Ask Questions

Towards Modeling and Influencing the Dynamics of Human Learning

Jan 02, 2023

Ran Tian, Masayoshi Tomizuka, Anca Dragan, Andrea Bajcsy

Figure 1 for Towards Modeling and Influencing the Dynamics of Human Learning

Figure 2 for Towards Modeling and Influencing the Dynamics of Human Learning

Figure 3 for Towards Modeling and Influencing the Dynamics of Human Learning

Figure 4 for Towards Modeling and Influencing the Dynamics of Human Learning

Abstract:Humans have internal models of robots (like their physical capabilities), the world (like what will happen next), and their tasks (like a preferred goal). However, human internal models are not always perfect: for example, it is easy to underestimate a robot's inertia. Nevertheless, these models change and improve over time as humans gather more experience. Interestingly, robot actions influence what this experience is, and therefore influence how people's internal models change. In this work we take a step towards enabling robots to understand the influence they have, leverage it to better assist people, and help human models more quickly align with reality. Our key idea is to model the human's learning as a nonlinear dynamical system which evolves the human's internal model given new observations. We formulate a novel optimization problem to infer the human's learning dynamics from demonstrations that naturally exhibit human learning. We then formalize how robots can influence human learning by embedding the human's learning dynamics model into the robot planning problem. Although our formulations provide concrete problem statements, they are intractable to solve in full generality. We contribute an approximation that sacrifices the complexity of the human internal models we can represent, but enables robots to learn the nonlinear dynamics of these internal models. We evaluate our inference and planning methods in a suite of simulated environments and an in-person user study, where a 7DOF robotic arm teaches participants to be better teleoperators. While influencing human learning remains an open problem, our results demonstrate that this influence is possible and can be helpful in real human-robot interaction.

* 18th ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2023

Via

Access Paper or Ask Questions

Simple Recurrence Improves Masked Language Models

May 23, 2022

Tao Lei, Ran Tian, Jasmijn Bastings, Ankur P. Parikh

Figure 1 for Simple Recurrence Improves Masked Language Models

Figure 2 for Simple Recurrence Improves Masked Language Models

Figure 3 for Simple Recurrence Improves Masked Language Models

Figure 4 for Simple Recurrence Improves Masked Language Models

Abstract:In this work, we explore whether modeling recurrence into the Transformer architecture can both be beneficial and efficient, by building an extremely simple recurrent module into the Transformer. We compare our model to baselines following the training and evaluation recipe of BERT. Our results confirm that recurrence can indeed improve Transformer models by a consistent margin, without requiring low-level performance optimizations, and while keeping the number of parameters constant. For example, our base model achieves an absolute improvement of 2.1 points averaged across 10 tasks and also demonstrates increased stability in fine-tuning over a range of learning rates.

Via

Access Paper or Ask Questions

Safety Assurances for Human-Robot Interaction via Confidence-aware Game-theoretic Human Models

Sep 29, 2021

Ran Tian, Liting Sun, Andrea Bajcsy, Masayoshi Tomizuka, Anca D. Dragan

Figure 1 for Safety Assurances for Human-Robot Interaction via Confidence-aware Game-theoretic Human Models

Figure 2 for Safety Assurances for Human-Robot Interaction via Confidence-aware Game-theoretic Human Models

Figure 3 for Safety Assurances for Human-Robot Interaction via Confidence-aware Game-theoretic Human Models

Figure 4 for Safety Assurances for Human-Robot Interaction via Confidence-aware Game-theoretic Human Models

Abstract:An outstanding challenge with safety methods for human-robot interaction is reducing their conservatism while maintaining robustness to variations in human behavior. In this work, we propose that robots use confidence-aware game-theoretic models of human behavior when assessing the safety of a human-robot interaction. By treating the influence between the human and robot as well as the human's rationality as unobserved latent states, we succinctly infer the degree to which a human is following the game-theoretic interaction model. We leverage this model to restrict the set of feasible human controls during safety verification, enabling the robot to confidently modulate the conservatism of its safety monitor online. Evaluations in simulated human-robot scenarios and ablation studies demonstrate that imbuing safety monitors with confidence-aware game-theoretic models enables both safe and efficient human-robot interaction. Moreover, evaluations with real traffic data show that our safety monitor is less conservative than traditional safety methods in real human driving scenarios.

Via

Access Paper or Ask Questions

Anytime Game-Theoretic Planning with Active Reasoning About Humans' Latent States for Human-Centered Robots

Sep 26, 2021

Ran Tian, Liting Sun, Masayoshi Tomizuka, David Isele

Figure 1 for Anytime Game-Theoretic Planning with Active Reasoning About Humans' Latent States for Human-Centered Robots

Figure 2 for Anytime Game-Theoretic Planning with Active Reasoning About Humans' Latent States for Human-Centered Robots

Figure 3 for Anytime Game-Theoretic Planning with Active Reasoning About Humans' Latent States for Human-Centered Robots

Figure 4 for Anytime Game-Theoretic Planning with Active Reasoning About Humans' Latent States for Human-Centered Robots

Abstract:A human-centered robot needs to reason about the cognitive limitation and potential irrationality of its human partner to achieve seamless interactions. This paper proposes an anytime game-theoretic planner that integrates iterative reasoning models, a partially observable Markov decision process, and chance-constrained Monte-Carlo belief tree search for robot behavioral planning. Our planner enables a robot to safely and actively reason about its human partner's latent cognitive states (bounded intelligence and irrationality) in real-time to maximize its utility better. We validate our approach in an autonomous driving domain where our behavioral planner and a low-level motion controller hierarchically control an autonomous car to negotiate traffic merges. Simulations and user studies are conducted to show our planner's effectiveness.

* Presented at ICRA 2021

Via

Access Paper or Ask Questions

Shatter: An Efficient Transformer Encoder with Single-Headed Self-Attention and Relative Sequence Partitioning

Aug 30, 2021

Ran Tian, Joshua Maynez, Ankur P. Parikh

Figure 1 for Shatter: An Efficient Transformer Encoder with Single-Headed Self-Attention and Relative Sequence Partitioning

Figure 2 for Shatter: An Efficient Transformer Encoder with Single-Headed Self-Attention and Relative Sequence Partitioning

Figure 3 for Shatter: An Efficient Transformer Encoder with Single-Headed Self-Attention and Relative Sequence Partitioning

Figure 4 for Shatter: An Efficient Transformer Encoder with Single-Headed Self-Attention and Relative Sequence Partitioning

Abstract:The highly popular Transformer architecture, based on self-attention, is the foundation of large pretrained models such as BERT, that have become an enduring paradigm in NLP. While powerful, the computational resources and time required to pretrain such models can be prohibitive. In this work, we present an alternative self-attention architecture, Shatter, that more efficiently encodes sequence information by softly partitioning the space of relative positions and applying different value matrices to different parts of the sequence. This mechanism further allows us to simplify the multi-headed attention in Transformer to single-headed. We conduct extensive experiments showing that Shatter achieves better performance than BERT, with pretraining being faster per step (15% on TPU), converging in fewer steps, and offering considerable memory savings (>50%). Put together, Shatter can be pretrained on 8 V100 GPUs in 7 days, and match the performance of BERT_Base -- making the cost of pretraining much more affordable.

Via

Access Paper or Ask Questions

Negotiation-Aware Reachability-Based Safety Verification for AutonomousDriving in Interactive Scenarios

Jun 04, 2021

Ran Tian, Anjian Li, Masayoshi Tomizuka, Liting Sun

Figure 1 for Negotiation-Aware Reachability-Based Safety Verification for AutonomousDriving in Interactive Scenarios

Figure 2 for Negotiation-Aware Reachability-Based Safety Verification for AutonomousDriving in Interactive Scenarios

Abstract:Safety assurance is a critical yet challenging aspect when developing self-driving technologies. Hamilton-Jacobi backward-reachability analysis is a formal verification tool for verifying the safety of dynamic systems in the presence of disturbances. However, the standard approach is too conservative to be applied to self-driving applications due to its worst-case assumption on humans' behaviors (i.e., guard against worst-case outcomes). In this work, we integrate a learning-based prediction algorithm and a game-theoretic human behavioral model to online update the conservativeness of backward-reachability analysis. We evaluate our approach using real driving data. The results show that, with reasonable assumptions on human behaviors, our approach can effectively reduce the conservativeness of the standard approach without sacrificing its safety verification ability.

* This work is presented at the ICRA 2021 Workshop on Safe Robot Control with Learned Motion and Environment Models

Via

Access Paper or Ask Questions

Learning Human Rewards by Inferring Their Latent Intelligence Levels in Multi-Agent Games: A Theory-of-Mind Approach with Application to Driving Data

Mar 07, 2021

Ran Tian, Masayoshi Tomizuka, Liting Sun

Figure 1 for Learning Human Rewards by Inferring Their Latent Intelligence Levels in Multi-Agent Games: A Theory-of-Mind Approach with Application to Driving Data

Figure 2 for Learning Human Rewards by Inferring Their Latent Intelligence Levels in Multi-Agent Games: A Theory-of-Mind Approach with Application to Driving Data

Figure 3 for Learning Human Rewards by Inferring Their Latent Intelligence Levels in Multi-Agent Games: A Theory-of-Mind Approach with Application to Driving Data

Figure 4 for Learning Human Rewards by Inferring Their Latent Intelligence Levels in Multi-Agent Games: A Theory-of-Mind Approach with Application to Driving Data

Abstract:Reward function, as an incentive representation that recognizes humans' agency and rationalizes humans' actions, is particularly appealing for modeling human behavior in human-robot interaction. Inverse Reinforcement Learning is an effective way to retrieve reward functions from demonstrations. However, it has always been challenging when applying it to multi-agent settings since the mutual influence between agents has to be appropriately modeled. To tackle this challenge, previous work either exploits equilibrium solution concepts by assuming humans as perfectly rational optimizers with unbounded intelligence or pre-assigns humans' interaction strategies a priori. In this work, we advocate that humans are bounded rational and have different intelligence levels when reasoning about others' decision-making process, and such an inherent and latent characteristic should be accounted for in reward learning algorithms. Hence, we exploit such insights from Theory-of-Mind and propose a new multi-agent Inverse Reinforcement Learning framework that reasons about humans' latent intelligence levels during learning. We validate our approach in both zero-sum and general-sum games with synthetic agents and illustrate a practical application to learning human drivers' reward functions from real driving data. We compare our approach with two baseline algorithms. The results show that by reasoning about humans' latent intelligence levels, the proposed approach has more flexibility and capability to retrieve reward functions that explain humans' driving behaviors better.

Via

Access Paper or Ask Questions

Local Additivity Based Data Augmentation for Semi-supervised NER

Oct 04, 2020

Jiaao Chen, Zhenghui Wang, Ran Tian, Zichao Yang, Diyi Yang

Figure 1 for Local Additivity Based Data Augmentation for Semi-supervised NER

Figure 2 for Local Additivity Based Data Augmentation for Semi-supervised NER

Figure 3 for Local Additivity Based Data Augmentation for Semi-supervised NER

Figure 4 for Local Additivity Based Data Augmentation for Semi-supervised NER

Abstract:Named Entity Recognition (NER) is one of the first stages in deep language understanding yet current NER models heavily rely on human-annotated data. In this work, to alleviate the dependence on labeled data, we propose a Local Additivity based Data Augmentation (LADA) method for semi-supervised NER, in which we create virtual samples by interpolating sequences close to each other. Our approach has two variations: Intra-LADA and Inter-LADA, where Intra-LADA performs interpolations among tokens within one sentence, and Inter-LADA samples different sentences to interpolate. Through linear additions between sampled training data, LADA creates an infinite amount of labeled data and improves both entity and context learning. We further extend LADA to the semi-supervised setting by designing a novel consistency loss for unlabeled data. Experiments conducted on two NER benchmarks demonstrate the effectiveness of our methods over several strong baselines. We have publicly released our code at https://github.com/GT-SALT/LADA.

* EMNLP 2020

Via

Access Paper or Ask Questions

Bounded Risk-Sensitive Markov Game and Its Inverse Reward Learning Problem

Sep 05, 2020

Ran Tian, Liting Sun, Masayoshi Tomizuka

Figure 1 for Bounded Risk-Sensitive Markov Game and Its Inverse Reward Learning Problem

Figure 2 for Bounded Risk-Sensitive Markov Game and Its Inverse Reward Learning Problem

Figure 3 for Bounded Risk-Sensitive Markov Game and Its Inverse Reward Learning Problem

Figure 4 for Bounded Risk-Sensitive Markov Game and Its Inverse Reward Learning Problem

Abstract:Classical game-theoretic approaches for multi-agent systems in both the forward policy learning/design problem and the inverse reward learning problem often make strong rationality assumptions: agents are perfectly rational expected utility maximizers. Specifically, the agents are risk-neutral to all uncertainties, maximize their expected rewards, and have unlimited computation resources to explore such policies. Such assumptions, however, substantially mismatch with many observed humans' behaviors such as satisficing with sub-optimal policies, risk-seeking and loss-aversion decisions. In this paper, we investigate the problem of bounded risk-sensitive Markov Game (BRSMG) and its inverse reward learning problem. Instead of assuming unlimited computation resources, we consider the influence of bounded intelligence by exploiting iterative reasoning models in BRSMG. Instead of assuming agents maximize their expected utilities (a risk-neutral measure), we consider the impact of risk-sensitive measures such as the cumulative prospect theory. Convergence analysis of BRSMG for both the forward policy learning and the inverse reward learning are established. The proposed forward policy learning and inverse reward learning algorithms in BRSMG are validated through a navigation scenario. Simulation results show that the behaviors of agents in BRSMG demonstrate both risk-averse and risk-seeking phenomena, which are consistent with observations from humans. Moreover, in the inverse reward learning task, the proposed bounded risk-sensitive inverse learning algorithm outperforms the baseline risk-neutral inverse learning algorithm.

Via

Access Paper or Ask Questions