The integration of autonomous vehicles into urban and highway environments necessitates the development of robust and adaptable behavior planning systems. This study presents an innovative approach to address this challenge by utilizing a Monte-Carlo Tree Search (MCTS) based algorithm for autonomous driving behavior planning. The core objective is to leverage the balance between exploration and exploitation inherent in MCTS to facilitate intelligent driving decisions in complex scenarios. We introduce an MCTS-based algorithm tailored to the specific demands of autonomous driving. This involves the integration of carefully crafted cost functions, encompassing safety, comfort, and passability metrics, into the MCTS framework. The effectiveness of our approach is demonstrated by enabling autonomous vehicles to navigate intricate scenarios, such as intersections, unprotected left turns, cut-ins, and ramps, even under traffic congestion, in real-time. Qualitative instances illustrate the integration of diverse driving decisions, such as lane changes, acceleration, and deceleration, into the MCTS framework. Moreover, quantitative results, derived from examining the impact of iteration time and look-ahead steps on decision quality and real-time applicability, substantiate the robustness of our approach. This robustness is further underscored by the high success rate of the MCTS algorithm across various scenarios.
Recently a line of researches has delved the use of graph neural networks (GNNs) for decentralized control in swarm robotics. However, it has been observed that relying solely on the states of immediate neighbors is insufficient to imitate a centralized control policy. To address this limitation, prior studies proposed incorporating $L$-hop delayed states into the computation. While this approach shows promise, it can lead to a lack of consensus among distant flock members and the formation of small clusters, consequently resulting in the failure of cohesive flocking behaviors. Instead, our approach leverages spatiotemporal GNN, named STGNN that encompasses both spatial and temporal expansions. The spatial expansion collects delayed states from distant neighbors, while the temporal expansion incorporates previous states from immediate neighbors. The broader and more comprehensive information gathered from both expansions results in more effective and accurate predictions. We develop an expert algorithm for controlling a swarm of robots and employ imitation learning to train our decentralized STGNN model based on the expert algorithm. We simulate the proposed STGNN approach in various settings, demonstrating its decentralized capacity to emulate the global expert algorithm. Further, we implemented our approach to achieve cohesive flocking, leader following and obstacle avoidance by a group of Crazyflie drones. The performance of STGNN underscores its potential as an effective and reliable approach for achieving cohesive flocking, leader following and obstacle avoidance tasks.
We investigate the problem of energy-constrained planning for a cooperative system of an Unmanned Ground Vehicles (UGV) and an Unmanned Aerial Vehicle (UAV). In scenarios where the UGV serves as a mobile base to ferry the UAV and as a charging station to recharge the UAV, we formulate a novel energy-constrained routing problem. To tackle this problem, we design an energy-aware routing algorithm, aiming to minimize the overall mission duration under the energy limitations of both vehicles. The algorithm first solves a Traveling Salesman Problem (TSP) to generate a guided tour. Then, it employs the Monte-Carlo Tree Search (MCTS) algorithm to refine the tour and generate paths for the two vehicles. We evaluate the performance of our algorithm through extensive simulations and a proof-of-concept experiment. The results show that our algorithm consistently achieves near-optimal mission time and maintains fast running time across a wide range of problem instances.
We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as ``pick up a cup on a kitchen table" or ``navigate to a sofa on which someone is sitting". In contrast to existing research on 3D scene graphs, OVSG supports free-form text input and open-vocabulary querying. Through a series of comparative experiments using the ScanNet dataset and a self-collected dataset, we demonstrate that our proposed approach significantly surpasses the performance of previous semantic-based localization techniques. Moreover, we highlight the practical application of OVSG in real-world robot navigation and manipulation experiments.
We study the problem of assigning robots with actions to track targets. The objective is to optimize the robot team's tracking quality which can be defined as the reduction in the uncertainty of the targets' states. Specifically, we consider two assignment problems given the different sensing capabilities of the robots. In the first assignment problem, a single robot is sufficient to track a target. To this end, we present a greedy algorithm (Algorithm 1) that assigns a robot with its action to each target. We prove that the greedy algorithm has a 1/2 approximation bound and runs in polynomial time. Then, we study the second assignment problem where two robots are necessary to track a target. We design another greedy algorithm (Algorithm 2) that assigns a pair of robots with their actions to each target. We prove that the greedy algorithm achieves a 1/3 approximation bound and has a polynomial running time. Moreover, we illustrate the performance of the two greedy algorithms in the ROS-Gazebo environment where the tracking patterns of one robot following one target using Algorithm 1 and two robots following one target using Algorithm 2 are clearly observed. Further, we conduct extensive comparisons to demonstrate that the two greedy algorithms perform close to their optimal counterparts and much better than their respective (1/2 and 1/3) approximation bounds.
In this work, we study the problem of decentralized multi-agent perimeter defense that asks for computing actions for defenders with local perceptions and communications to maximize the capture of intruders. One major challenge for practical implementations is to make perimeter defense strategies scalable for large-scale problem instances. To this end, we leverage graph neural networks (GNNs) to develop an imitation learning framework that learns a mapping from defenders' local perceptions and their communication graph to their actions. The proposed GNN-based learning network is trained by imitating a centralized expert algorithm such that the learned actions are close to that generated by the expert algorithm. We demonstrate that our proposed network performs closer to the expert algorithm and is superior to other baseline algorithms by capturing more intruders. Our GNN-based network is trained at a small scale and can be generalized to large-scale cases. We run perimeter defense games in scenarios with different team sizes and configurations to demonstrate the performance of the learned network.
Autoencoders are unsupervised neural networks that are used to process and compress input data and then reconstruct the data back to the original data size. This allows autoencoders to be used for different processing applications such as data compression, image classification, image noise reduction, and image coloring. Hardware-wise, re-configurable architectures like Field Programmable Gate Arrays (FPGAs) have been used for accelerating computations from several domains because of their unique combination of flexibility, performance, and power efficiency. In this paper, we look at the different autoencoders available and use the convolutional autoencoder in both FPGA and GPU-based implementations to process noisy static MNIST images. We compare the different results achieved with the FPGA and GPU-based implementations and then discuss the pros and cons of each implementation. The evaluation of the proposed design achieved 80%accuracy and our experimental results show that the proposed accelerator achieves a throughput of 21.12 Giga-Operations Per Second (GOP/s) with a 5.93 W on-chip power consumption at 100 MHz. The comparison results with off-the-shelf devices and recent state-of-the-art implementations illustrate that the proposed accelerator has obvious advantages in terms of energy efficiency and design flexibility. We also discuss future work that can be done with the use of our proposed accelerator.
A large number of pornographic audios publicly available on the Internet seriously threaten the mental and physical health of children, but these audios are rarely detected and filtered. In this paper, we firstly propose a convolutional neural networks (CNN) based model for acoustic pornography recognition. Then, we research a collection of refinements and verify their effectiveness through ablation studies. Finally, we stack all refinements together to verify whether they can further improve the accuracy of the model. Experimental results on our newly-collected large dataset consisting of 224127 pornographic audios and 274206 normal samples demonstrate the effectiveness of our proposed model and these refinements. Specifically, the proposed model achieves an accuracy of 92.46% and the accuracy is further improved to 97.19% when all refinements are combined.
We study the problem that requires a team of robots to perform joint localization and target tracking task while ensuring team connectivity and collision avoidance. The problem can be formalized as a nonlinear, non-convex optimization program, which is typically hard to solve. To this end, we design a two-staged approach that utilizes a greedy algorithm to optimize the joint localization and target tracking performance and applies control barrier functions to ensure safety constraints, i.e., maintaining connectivity of the robot team and preventing inter-robot collisions. Simulated Gazebo experiments verify the effectiveness of the proposed approach. We further compare our greedy algorithm to a non-linear optimization solver and a random algorithm, in terms of the joint localization and tracking quality as well as the computation time. The results demonstrate that our greedy algorithm achieves high task quality and runs efficiently.
Traditional approaches for active mapping focus on building geometric maps. For most real-world applications, however, actionable information is related to semantically meaningful objects in the environment. We propose an approach to the active metric-semantic mapping problem that enables multiple heterogeneous robots to collaboratively build a map of the environment. The robots actively explore to minimize the uncertainties in both semantic (object classification) and geometric (object modeling) information. We represent the environment using informative but sparse object models, each consisting of a basic shape and a semantic class label, and characterize uncertainties empirically using a large amount of real-world data. Given a prior map, we use this model to select actions for each robot to minimize uncertainties. The performance of our algorithm is demonstrated through multi-robot experiments in diverse real-world environments. The proposed framework is applicable to a wide range of real-world problems, such as precision agriculture, infrastructure inspection, and asset mapping in factories. A demo video can be found at https://youtu.be/S86SgXi54oU.