A number of deep reinforcement-learning (RL) approaches propose to control traffic signals. In this work, we study the robustness of such methods along two axes. First, sensor failures and GPS occlusions create missing-data challenges and we show that recent methods remain brittle in the face of these missing data. Second, we provide a more systematic study of the generalization ability of RL methods to new networks with different traffic regimes. Again, we identify the limitations of recent approaches. We then propose using a combination of distributional and vanilla reinforcement learning through a policy ensemble. Building upon the state-of-the-art previous model which uses a decentralized approach for large-scale traffic signal control with graph convolutional networks (GCNs), we first learn models using a distributional reinforcement learning (DisRL) approach. In particular, we use implicit quantile networks (IQN) to model the state-action return distribution with quantile regression. For traffic signal control problems, an ensemble of standard RL and DisRL yields superior performance across different scenarios, including different levels of missing sensor data and traffic flow patterns. Furthermore, the learning scheme of the resulting model can improve zero-shot transferability to different road network structures, including both synthetic networks and real-world networks (e.g., Luxembourg, Manhattan). We conduct extensive experiments to compare our approach to multi-agent reinforcement learning and traditional transportation approaches. Results show that the proposed method improves robustness and generalizability in the face of missing data, varying road networks, and traffic flows.
Most recent studies on neural constituency parsing focus on encoder structures, while few developments are devoted to decoders. Previous research has demonstrated that probabilistic statistical methods based on syntactic rules are particularly effective in constituency parsing, whereas syntactic rules are not used during the training of neural models in prior work probably due to their enormous computation requirements. In this paper, we first implement a fast CKY decoding procedure harnessing GPU acceleration, based on which we further derive a syntactic rule-based (rule-constrained) CKY decoding. In the experiments, our method obtains 95.89 and 92.52 F1 on the datasets of PTB and CTB respectively, which shows significant improvements compared with previous approaches. Besides, our parser achieves strong and competitive cross-domain performance in zero-shot settings.
In constituency parsing, span-based decoding is an important direction. However, for Chinese sentences, because of their linguistic characteristics, it is necessary to utilize other models to perform word segmentation first, which introduces a series of uncertainties and generally leads to errors in the computation of the constituency tree afterward. This work proposes a method for joint Chinese word segmentation and Span-based Constituency Parsing by adding extra labels to individual Chinese characters on the parse trees. Through experiments, the proposed algorithm outperforms the recent models for joint segmentation and constituency parsing on CTB 5.1.
We propose a novel algorithm that improves on the previous neural span-based CKY decoder for constituency parsing. In contrast to the traditional span-based decoding, where spans are combined only based on the sum of their scores, we introduce an order-sensitive strategy, where the span combination scores are more carefully derived from an order-sensitive basis. Our decoder can be regarded as a generalization over existing span-based decoder in determining a finer-grain scoring scheme for the combination of lower-level spans into higher-level spans, where we emphasize on the order of the lower-level spans and use order-sensitive span scores as well as order-sensitive combination grammar rule scores to enhance prediction accuracy. We implement the proposed decoding strategy harnessing GPU parallelism and achieve a decoding speed on par with state-of-the-art span-based parsers. Using the previous state-of-the-art model without additional data as our baseline, we outperform it and improve the F1 score on the Penn Treebank Dataset by 0.26% and on the Chinese Treebank Dataset by 0.35%.
Recent advances in deep reinforcement learning (RL) have demonstrated complex decision-making capabilities in simulation environments such as Arcade Learning Environment, MuJoCo, and ViZDoom. However, they are hardly extensible to more complicated problems, mainly due to the lack of complexity and variations in the environments they are trained and tested on. Furthermore, they are not extensible to an open-world environment to facilitate long-term exploration research. To learn realistic task-solving capabilities, we need to develop an environment with greater diversity and complexity. We developed WILD-SCAV, a powerful and extensible environment based on a 3D open-world FPS (First-Person Shooter) game to bridge the gap. It provides realistic 3D environments of variable complexity, various tasks, and multiple modes of interaction, where agents can learn to perceive 3D environments, navigate and plan, compete and cooperate in a human-like manner. WILD-SCAV also supports different complexities, such as configurable maps with different terrains, building structures and distributions, and multi-agent settings with cooperative and competitive tasks. The experimental results on configurable complexity, multi-tasking, and multi-agent scenarios demonstrate the effectiveness of WILD-SCAV in benchmarking various RL algorithms, as well as it is potential to give rise to intelligent agents with generalized task-solving abilities. The link to our open-sourced code can be found here https://github.com/inspirai/wilderness-scavenger.
Axie infinity is a complicated card game with a huge-scale action space. This makes it difficult to solve this challenge using generic Reinforcement Learning (RL) algorithms. We propose a hybrid RL framework to learn action representations and game strategies. To avoid evaluating every action in the large feasible action set, our method evaluates actions in a fixed-size set which is determined using action representations. We compare the performance of our method with the other two baseline methods in terms of their sample efficiency and the winning rates of the trained models. We empirically show that our method achieves an overall best winning rate and the best sample efficiency among the three methods.
In the coming years and decades, autonomous vehicles (AVs) will become increasingly prevalent, offering new opportunities for safer and more convenient travel and potentially smarter traffic control methods exploiting automation and connectivity. Car following is a prime function in autonomous driving. Car following based on reinforcement learning has received attention in recent years with the goal of learning and achieving performance levels comparable to humans. However, most existing RL methods model car following as a unilateral problem, sensing only the vehicle ahead. Recent literature, however, Wang and Horn  has shown that bilateral car following that considers the vehicle ahead and the vehicle behind exhibits better system stability. In this paper we hypothesize that this bilateral car following can be learned using RL, while learning other goals such as efficiency maximisation, jerk minimization, and safety rewards leading to a learned model that outperforms human driving. We propose and introduce a Deep Reinforcement Learning (DRL) framework for car following control by integrating bilateral information into both state and reward function based on the bilateral control model (BCM) for car following control. Furthermore, we use a decentralized multi-agent reinforcement learning framework to generate the corresponding control action for each agent. Our simulation results demonstrate that our learned policy is better than the human driving policy in terms of (a) inter-vehicle headways, (b) average speed, (c) jerk, (d) Time to Collision (TTC) and (e) string stability.
Reinforcement learning (RL) is a powerful data-driven control method that has been largely explored in autonomous driving tasks. However, conventional RL approaches learn control policies through trial-and-error interactions with the environment and therefore may cause disastrous consequences such as collisions when testing in real-world traffic. Offline RL has recently emerged as a promising framework to learn effective policies from previously-collected, static datasets without the requirement of active interactions, making it especially appealing for autonomous driving applications. Despite promising, existing offline RL algorithms such as Batch-Constrained deep Q-learning (BCQ) generally lead to rather conservative policies with limited exploration efficiency. To address such issues, this paper presents an enhanced BCQ algorithm by employing a learnable parameter noise scheme in the perturbation model to increase the diversity of observed actions. In addition, a Lyapunov-based safety enhancement strategy is incorporated to constrain the explorable state space within a safe region. Experimental results in highway and parking traffic scenarios show that our approach outperforms the conventional RL method, as well as state-of-the-art offline RL algorithms.
Connected and automated vehicles (CAVs) have attracted more and more attention recently. The fast actuation time allows them having the potential to promote the efficiency and safety of the whole transportation system. Due to technical challenges, there will be a proportion of vehicles that can be equipped with automation while other vehicles are without automation. Instead of learning a reliable behavior for ego automated vehicle, we focus on how to improve the outcomes of the total transportation system by allowing each automated vehicle to learn cooperation with each other and regulate human-driven traffic flow. One of state of the art method is using reinforcement learning to learn intelligent decision making policy. However, direct reinforcement learning framework cannot improve the performance of the whole system. In this article, we demonstrate that considering the problem in multi-agent setting with shared policy can help achieve better system performance than non-shared policy in single-agent setting. Furthermore, we find that utilization of attention mechanism on interaction features can capture the interplay between each agent in order to boost cooperation. To the best of our knowledge, while previous automated driving studies mainly focus on enhancing individual's driving performance, this work serves as a starting point for research on system-level multi-agent cooperation performance using graph information sharing. We conduct extensive experiments in car-following and unsignalized intersection settings. The results demonstrate that CAVs controlled by our method can achieve the best performance against several state of the art baselines.