In this paper, we consider a team of mobile robots executing simultaneously multiple behaviors by different subgroups, while maintaining global and subgroup line-of-sight (LOS) network connectivity that minimally constrains the original multi-robot behaviors. The LOS connectivity between pairwise robots is preserved when two robots stay within the limited communication range and their LOS remains occlusion-free from static obstacles while moving. By using control barrier functions (CBF) and minimum volume enclosing ellipsoids (MVEE), we first introduce the LOS connectivity barrier certificate (LOS-CBC) to characterize the state-dependent admissible control space for pairwise robots, from which their resulting motion will keep the two robots LOS connected over time. We then propose the Minimum Line-of-Sight Connectivity Constraint Spanning Tree (MLCCST) as a step-wise bilevel optimization framework to jointly optimize (a) the minimum set of LOS edges to actively maintain, and (b) the control revision with respect to a nominal multi-robot controller due to LOS connectivity maintenance. As proved in the theoretical analysis, this allows the robots to improvise the optimal composition of LOS-CBC control constraints that are least constraining around the nominal controllers, and at the same time enforce the global and subgroup LOS connectivity through the resulting preserved set of pairwise LOS edges. The framework thus leads to robots staying as close to their nominal behaviors, while exhibiting dynamically changing LOS-connected network topology that provides the greatest flexibility for the existing multi-robot tasks in real time. We demonstrate the effectiveness of our approach through simulations with up to 64 robots.
Prior work has looked at applying reinforcement learning and imitation learning approaches to autonomous driving scenarios, but either the safety or the efficiency of the algorithm is compromised. With the use of control barrier functions embedded into the reinforcement learning policy, we arrive at safe policies to optimize the performance of the autonomous driving vehicle. However, control barrier functions need a good approximation of the model of the car. We use probabilistic control barrier functions as an estimate of the model uncertainty. The algorithm is implemented as an online version in the CARLA (Dosovitskiy et al., 2017) Simulator and as an offline version on a dataset extracted from the NGSIM Database. The proposed algorithm is not just a safe ramp merging algorithm but a safe autonomous driving algorithm applied to address ramp merging on highways.
Pretrained language models have demonstrated extraordinary capabilities in language generation. However, real-world tasks often require controlling the distribution of generated text in order to mitigate bias, promote fairness, and achieve personalization. Existing techniques for controlling the distribution of generated text only work with quantified distributions, which require pre-defined categories, proportions of the distribution, or an existing corpus following the desired distributions. However, many important distributions, such as personal preferences, are unquantified. In this work, we tackle the problem of generating text following arbitrary distributions (quantified and unquantified) by proposing Nano, a few-shot human-in-the-loop training algorithm that continuously learns from human feedback. Nano achieves state-of-the-art results on single topic/attribute as well as quantified distribution control compared to previous works. We also show that Nano is able to learn unquantified distributions, achieves personalization, and captures differences between different individuals' personal preferences with high sample efficiency.
The promise of multimodal models for real-world applications has inspired research in visualizing and understanding their internal mechanics with the end goal of empowering stakeholders to visualize model behavior, perform model debugging, and promote trust in machine learning models. However, modern multimodal models are typically black-box neural networks, which makes it challenging to understand their internal mechanics. How can we visualize the internal modeling of multimodal interactions in these models? Our paper aims to fill this gap by proposing MultiViz, a method for analyzing the behavior of multimodal models by scaffolding the problem of interpretability into 4 stages: (1) unimodal importance: how each modality contributes towards downstream modeling and prediction, (2) cross-modal interactions: how different modalities relate with each other, (3) multimodal representations: how unimodal and cross-modal interactions are represented in decision-level features, and (4) multimodal prediction: how decision-level features are composed to make a prediction. MultiViz is designed to operate on diverse modalities, models, tasks, and research areas. Through experiments on 8 trained models across 6 real-world tasks, we show that the complementary stages in MultiViz together enable users to (1) simulate model predictions, (2) assign interpretable concepts to features, (3) perform error analysis on model misclassifications, and (4) use insights from error analysis to debug models. MultiViz is publicly available, will be regularly updated with new interpretation tools and metrics, and welcomes inputs from the community.
This paper introduces a novel social preference-aware decentralized safe control framework to address the responsibility allocation problem in multi-agent collision avoidance. Considering that agents do not necessarily cooperate in symmetric ways, this paper focuses on semi-cooperative behavior among heterogeneous agents with varying cooperation levels. Drawing upon the idea of Social Value Orientation (SVO) for quantifying the individual selfishness, we propose a novel concept of Responsibility-associated Social Value Orientation (R-SVO) to express the intended relative social implications between pairwise agents. This is used to redefine each agent's social preferences or personalities in terms of corresponding responsibility shares in contributing to the coordination scenario, such as semi-cooperative collision avoidance where all agents interact in an asymmetric way. By incorporating such relative social implications through proposed Local Pairwise Responsibility Weights, we develop a Responsibility-associated Control Barrier Function-based safe control framework for individual agents, and multi-agent collision avoidance is achieved with formally provable safety guarantees. Simulations are provided to demonstrate the effectiveness and efficiency of the proposed framework in several multi-agent navigation tasks, such as a position-swapping game, a self-driving car highway ramp merging scenario, and a circular position swapping game.
With the increasing need for safe control in the domain of autonomous driving, model-based safety-critical control approaches are widely used, especially Control Barrier Function (CBF)-based approaches. Among them, Exponential CBF (eCBF) is particularly popular due to its realistic applicability to high-relative-degree systems. However, for most of the optimization-based controllers utilizing CBF-based constraints, solution feasibility is a common issue arising from potential conflict among different constraints. Moreover, how to incorporate uncertainty into the eCBF-based constraints in high-relative-degree systems to account for safety remains an open challenge. In this paper, we present a novel approach to extend an eCBF-based safe critical controller to a probabilistic setting to handle potential motion uncertainty from system dynamics. More importantly, we leverage an optimization-based technique to provide a solution feasibility guarantee in run time, while ensuring probabilistic safety. Lane changing and intersection handling are demonstrated as two use cases, and experiment results are provided to show the effectiveness of the proposed approach.
Learning multimodal representations involves discovering correspondences and integrating information from multiple heterogeneous sources of data. While recent research has begun to explore the design of more general-purpose multimodal models (contrary to prior focus on domain and modality-specific architectures), these methods are still largely focused on a small set of modalities in the language, vision, and audio space. In order to accelerate generalization towards diverse and understudied modalities, we investigate methods for high-modality (a large set of diverse modalities) and partially-observable (each task only defined on a small subset of modalities) scenarios. To tackle these challenges, we design a general multimodal model that enables multitask and transfer learning: multitask learning with shared parameters enables stable parameter counts (addressing scalability), and cross-modal transfer learning enables information sharing across modalities and tasks (addressing partial observability). Our resulting model generalizes across text, image, video, audio, time-series, sensors, tables, and set modalities from different research areas, improves the tradeoff between performance and efficiency, transfers to new modalities and tasks, and reveals surprising insights on the nature of information sharing in multitask models. We release our code and benchmarks which we hope will present a unified platform for subsequent theoretical and empirical analysis: https://github.com/pliang279/HighMMT.
The ability for a human to understand an Artificial Intelligence (AI) model's decision-making process is critical in enabling stakeholders to visualize model behavior, perform model debugging, promote trust in AI models, and assist in collaborative human-AI decision-making. As a result, the research fields of interpretable and explainable AI have gained traction within AI communities as well as interdisciplinary scientists seeking to apply AI in their subject areas. In this paper, we focus on advancing the state-of-the-art in interpreting multimodal models - a class of machine learning methods that tackle core challenges in representing and capturing interactions between heterogeneous data sources such as images, text, audio, and time-series data. Multimodal models have proliferated numerous real-world applications across healthcare, robotics, multimedia, affective computing, and human-computer interaction. By performing model disentanglement into unimodal contributions (UC) and multimodal interactions (MI), our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models while maintaining generality across arbitrary modalities, model architectures, and tasks. Through a comprehensive suite of experiments on both synthetic and real-world multimodal tasks, we show that DIME generates accurate disentangled explanations, helps users of multimodal models gain a deeper understanding of model behavior, and presents a step towards debugging and improving these models for real-world deployment. Code for our experiments can be found at https://github.com/lvyiwei1/DIME.
With the increasing emphasis on the safe autonomy for robots, model-based safe control approaches such as Control Barrier Functions have been extensively studied to ensure guaranteed safety during inter-robot interactions. In this paper, we introduce the Parametric Control Barrier Function (Parametric-CBF), a novel variant of the traditional Control Barrier Function to extend its expressivity in describing different safe behaviors among heterogeneous robots. Instead of assuming cooperative and homogeneous robots using the same safe controllers, the ego robot is able to model the neighboring robots' underlying safe controllers through different Parametric-CBFs with observed data. Given learned parametric-CBF and proved forward invariance, it provides greater flexibility for the ego robot to better coordinate with other heterogeneous robots with improved efficiency while enjoying formally provable safety guarantees. We demonstrate the usage of Parametric-CBF in behavior prediction and adaptive safe control in the ramp merging scenario from the applications of autonomous driving. Compared to traditional CBF, Parametric-CBF has the advantage of capturing varying drivers' characteristics given richer description of robot behavior in the context of safe control. Numerical simulations are given to validate the effectiveness of the proposed method.
Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. MultiBench provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, MultiBench offers a comprehensive methodology to assess (1) generalization, (2) time and space complexity, and (3) modality robustness. MultiBench introduces impactful challenges for future research, including scalability to large-scale multimodal datasets and robustness to realistic imperfections. To accompany this benchmark, we also provide a standardized implementation of 20 core approaches in multimodal learning. Simply applying methods proposed in different research areas can improve the state-of-the-art performance on 9/15 datasets. Therefore, MultiBench presents a milestone in unifying disjoint efforts in multimodal research and paves the way towards a better understanding of the capabilities and limitations of multimodal models, all the while ensuring ease of use, accessibility, and reproducibility. MultiBench, our standardized code, and leaderboards are publicly available, will be regularly updated, and welcomes inputs from the community.