As assembly tasks grow in complexity, collaboration among multiple robots becomes essential for task completion. However, centralized task planning has become inadequate for adapting to the increasing intelligence and versatility of robots, along with rising customized orders. There is a need for efficient and automated planning mechanisms capable of coordinating diverse robots for collaborative assembly. To this end, we propose a Stackelberg game-theoretic learning approach. By leveraging Stackelberg games, we characterize robot collaboration through leader-follower interaction to enhance strategy seeking and ensure task completion. To enhance applicability across tasks, we introduce a novel multi-agent learning algorithm: Stackelberg double deep Q-learning, which facilitates automated assembly strategy seeking and multi-robot coordination. Our approach is validated through simulated assembly tasks. Comparison with three alternative multi-agent learning methods shows that our approach achieves the shortest task completion time for tasks. Furthermore, our approach exhibits robustness against both accidental and deliberate environmental perturbations.
Shared control allows the human driver to collaborate with an assistive driving system while retaining the ability to make decisions and take control if necessary. However, human-vehicle teaming and planning are challenging due to environmental uncertainties, the human's bounded rationality, and the variability in human behaviors. An effective collaboration plan needs to learn and adapt to these uncertainties. To this end, we develop a Stackelberg meta-learning algorithm to create automated learning-based planning for shared control. The Stackelberg games are used to capture the leader-follower structure in the asymmetric interactions between the human driver and the assistive driving system. The meta-learning algorithm generates a common behavioral model, which is capable of fast adaptation using a small amount of driving data to assist optimal decision-making. We use a case study of an obstacle avoidance driving scenario to corroborate that the adapted human behavioral model can successfully assist the human driver in reaching the target destination. Besides, it saves driving time compared with a driver-only scheme and is also robust to drivers' bounded rationality and errors.
Robot allocation plays an essential role in facilitating robotic service provision across various domains. Yet the increasing number of users and the uncertainties regarding the users' true service requirements have posed challenges for the service provider in effectively allocating service robots to users to meet their needs. In this work, we first propose a contract-based approach to enable incentive-compatible service selection so that the service provider can effectively reduce the user's service uncertainties for precise service provision. Then, we develop a distributed allocation algorithm that incorporates robot dynamics and collision avoidance to allocate service robots and address scalability concerns associated with increasing numbers of service robots and users. We conduct simulations in eight scenarios to validate our approach. Comparative analysis against the robust allocation paradigm and two alternative uncertainty reduction strategies demonstrates that our approach achieves better allocation efficiency and accuracy.
We are currently facing unprecedented cyber warfare with the rapid evolution of tactics, increasing asymmetry of intelligence, and the growing accessibility of hacking tools. In this landscape, cyber deception emerges as a critical component of our defense strategy against increasingly sophisticated attacks. This chapter aims to highlight the pivotal role of game-theoretic models and foundation models (FMs) in analyzing, designing, and implementing cyber deception tactics. Game models (GMs) serve as a foundational framework for modeling diverse adversarial interactions, allowing us to encapsulate both adversarial knowledge and domain-specific insights. Meanwhile, FMs serve as the building blocks for creating tailored machine learning models suited to given applications. By leveraging the synergy between GMs and FMs, we can advance proactive and automated cyber defense mechanisms by not only securing our networks against attacks but also enhancing their resilience against well-planned operations. This chapter discusses the games at the tactical, operational, and strategic levels of warfare, delves into the symbiotic relationship between these methodologies, and explores relevant applications where such a framework can make a substantial impact in cybersecurity. The chapter discusses the promising direction of the multi-agent neurosymbolic conjectural learning (MANSCOL), which allows the defender to predict adversarial behaviors, design adaptive defensive deception tactics, and synthesize knowledge for the operational level synthesis and adaptation. FMs serve as pivotal tools across various functions for MANSCOL, including reinforcement learning, knowledge assimilation, formation of conjectures, and contextual representation. This chapter concludes with a discussion of the challenges associated with FMs and their application in the domain of cybersecurity.
Asymmetric information stochastic games (\textsc{aisg}s) arise in many complex socio-technical systems, such as cyber-physical systems and IT infrastructures. Existing computational methods for \textsc{aisg}s are primarily offline and can not adapt to equilibrium deviations. Further, current methods are limited to special classes of \textsc{aisg}s to avoid belief hierarchies. To address these limitations, we propose conjectural online learning (\textsc{col}), an online method for generic \textsc{aisg}s. \textsc{col} uses a forecaster-actor-critic (\textsc{fac}) architecture where subjective forecasts are used to conjecture the opponents' strategies within a lookahead horizon, and Bayesian learning is used to calibrate the conjectures. To adapt strategies to nonstationary environments, \textsc{col} uses online rollout with cost function approximation (actor-critic). We prove that the conjectures produced by \textsc{col} are asymptotically consistent with the information feedback in the sense of a relaxed Bayesian consistency. We also prove that the empirical strategy profile induced by \textsc{col} converges to the Berk-Nash equilibrium, a solution concept characterizing rationality under subjectivity. Experimental results from an intrusion response use case demonstrate \textsc{col}'s superiority over state-of-the-art reinforcement learning methods against nonstationary attacks.
We study automated security response for an IT infrastructure and formulate the interaction between an attacker and a defender as a partially observed, non-stationary game. We relax the standard assumption that the game model is correctly specified and consider that each player has a probabilistic conjecture about the model, which may be misspecified in the sense that the true model has probability 0. This formulation allows us to capture uncertainty about the infrastructure and the intents of the players. To learn effective game strategies online, we design a novel method where a player iteratively adapts its conjecture using Bayesian learning and updates its strategy through rollout. We prove that the conjectures converge to best fits, and we provide a bound on the performance improvement that rollout enables with a conjectured model. To characterize the steady state of the game, we propose a variant of the Berk-Nash equilibrium. We present our method through an advanced persistent threat use case. Simulation studies based on testbed measurements show that our method produces effective security strategies that adapt to a changing environment. We also find that our method enables faster convergence than current reinforcement learning techniques.
Offline reinforcement learning (RL) leverages previously collected data to extract policies that return satisfying performance in online environments. However, offline RL suffers from the distribution shift between the offline dataset and the online environment. In the multi-agent RL (MARL) setting, this distribution shift may arise from the nonstationary opponents (exogenous agents beyond control) in the online testing who display distinct behaviors from those recorded in the offline dataset. Hence, the key to the broader deployment of offline MARL is the online adaptation to nonstationary opponents. Recent advances in large language models have demonstrated the surprising generalization ability of the transformer architecture in sequence modeling, which prompts one to wonder \textit{whether the offline-trained transformer policy adapts to nonstationary opponents during online testing}. This work proposes the self-confirming loss (SCL) in offline transformer training to address the online nonstationarity, which is motivated by the self-confirming equilibrium (SCE) in game theory. The gist is that the transformer learns to predict the opponents' future moves based on which it acts accordingly. As a weaker variant of Nash equilibrium (NE), SCE (equivalently, SCL) only requires local consistency: the agent's local observations do not deviate from its conjectures, leading to a more adaptable policy than the one dictated by NE focusing on global optimality. We evaluate the online adaptability of the self-confirming transformer (SCT) by playing against nonstationary opponents employing a variety of policies, from the random one to the benchmark MARL policies. Experimental results demonstrate that SCT can adapt to nonstationary opponents online, achieving higher returns than vanilla transformers and offline MARL baselines.
Guided trajectory planning involves a leader robotic agent strategically directing a follower robotic agent to collaboratively reach a designated destination. However, this task becomes notably challenging when the leader lacks complete knowledge of the follower's decision-making model. There is a need for learning-based methods to effectively design the cooperative plan. To this end, we develop a Stackelberg game-theoretic approach based on Koopman operator to address the challenge. We first formulate the guided trajectory planning problem through the lens of a dynamic Stackelberg game. We then leverage Koopman operator theory to acquire a learning-based linear system model that approximates the follower's feedback dynamics. Based on this learned model, the leader devises a collision-free trajectory to guide the follower, employing receding horizon planning. We use simulations to elaborate the effectiveness of our approach in generating learning models that accurately predict the follower's multi-step behavior when compared to alternative learning techniques. Moreover, our approach successfully accomplishes the guidance task and notably reduces the leader's planning time to nearly half when contrasted with the model-based baseline method.
In the area of learning-driven artificial intelligence advancement, the integration of machine learning (ML) into self-driving (SD) technology stands as an impressive engineering feat. Yet, in real-world applications outside the confines of controlled laboratory scenarios, the deployment of self-driving technology assumes a life-critical role, necessitating heightened attention from researchers towards both safety and efficiency. To illustrate, when a self-driving model encounters an unfamiliar environment in real-time execution, the focus must not solely revolve around enhancing its anticipated performance; equal consideration must be given to ensuring its execution or real-time adaptation maintains a requisite level of safety. This study introduces an algorithm for online meta-reinforcement learning, employing lookahead symbolic constraints based on \emph{Neurosymbolic Meta-Reinforcement Lookahead Learning} (NUMERLA). NUMERLA proposes a lookahead updating mechanism that harmonizes the efficiency of online adaptations with the overarching goal of ensuring long-term safety. Experimental results demonstrate NUMERLA confers the self-driving agent with the capacity for real-time adaptability, leading to safe and self-adaptive driving under non-stationary urban human-vehicle interaction scenarios.