Consistent and reproducible evaluation of Deep Reinforcement Learning (DRL) is not straightforward. In the Arcade Learning Environment (ALE), small changes in environment parameters such as stochasticity or the maximum allowed play time can lead to very different performance. In this work, we discuss the difficulties of comparing different agents trained on ALE. In order to take a step further towards reproducible and comparable DRL, we introduce SABER, a Standardized Atari BEnchmark for general Reinforcement learning algorithms. Our methodology extends previous recommendations and contains a complete set of environment parameters as well as train and test procedures. We then use SABER to evaluate the current state of the art, Rainbow. Furthermore, we introduce a human world records baseline, and argue that previous claims of expert or superhuman performance of DRL might not be accurate. Finally, we propose Rainbow-IQN by extending Rainbow with Implicit Quantile Networks (IQN) leading to new state-of-the-art performance. Source code is available for reproducibility.
Clinical decision support systems (CDSS) will play an in-creasing role in improving the quality of medical care for critically ill patients. However, due to limitations in current informatics infrastructure, CDSS do not always have com-plete information on state of supporting physiologic monitor-ing devices, which can limit the input data available to CDSS. This is especially true in the use case of mechanical ventilation (MV), where current CDSS have no knowledge of critical ventilation settings, such as ventilation mode. To enable MV CDSS to make accurate recommendations related to ventilator mode, we developed a highly performant ma-chine learning model that is able to perform per-breath clas-sification of 5 of the most widely used ventilation modes in the USA with an average F1-score of 97.52%. We also show how our approach makes methodologic improvements over previous work and that it is highly robust to missing data caused by software/sensor error.
Recent advancements in graph neural networks (GNN) have led to state-of-the-art performance in various applications including chemo-informatics, question answering systems, and recommendation systems, to name a few. However, making these methods scalable to huge graphs such as web-mining remains a challenge. In particular, the existing methods for accelerating GNN are either not theoretically guaranteed in terms of approximation error or require at least linear time computation cost. In this paper, we propose a constant time approximation algorithm for the inference and training of GNN that theoretically guarantees arbitrary precision with arbitrary probability. The key advantage of the proposed algorithm is that the complexity is completely independent of the number of nodes, edges, and neighbors of the input. To the best of our knowledge, this is the first constant time approximation algorithm for GNN with theoretical guarantee. Through experiments using synthetic and real-world datasets, we evaluate our proposed approximation algorithm and show that the algorithm can successfully approximate GNN in constant time.
Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions.
Inspired by recent successes of Monte-Carlo tree search (MCTS) in a number of artificial intelligence (AI) application domains, we propose a model-based reinforcement learning (RL) technique that iteratively applies MCTS on batches of small, finite-horizon versions of the original infinite-horizon Markov decision process. The terminal condition of the finite-horizon problems, or the leaf-node evaluator of the decision tree generated by MCTS, is specified using a combination of an estimated value function and an estimated policy function. The recommendations generated by the MCTS procedure are then provided as feedback in order to refine, through classification and regression, the leaf-node evaluator for the next iteration. We provide the first sample complexity bounds for a tree search-based RL algorithm. In addition, we show that a deep neural network implementation of the technique can create a competitive AI agent for the popular multi-player online battle arena (MOBA) game King of Glory.
The Clustered Vehicle Routing Problem (CluVRP) is a variant of the Capacitated Vehicle Routing Problem in which customers are grouped into clusters. Each cluster has to be visited once, and a vehicle entering a cluster cannot leave it until all customers have been visited. This article presents two alternative hybrid metaheuristic algorithms for the CluVRP. The first algorithm is based on an Iterated Local Search algorithm, in which only feasible solutions are explored and problem-specific local search moves are utilized. The second algorithm is a Hybrid Genetic Search, for which the shortest Hamiltonian path between each pair of vertices within each cluster should be precomputed. Using this information, a sequence of clusters can be used as a solution representation and large neighborhoods can be efficiently explored by means of bi-directional dynamic programming, sequence concatenations, by using appropriate data structures. Extensive computational experiments are performed on benchmark instances from the literature, as well as new large scale ones. Recommendations on promising algorithm choices are provided relatively to average cluster size.
Teaching is one of the most important factors affecting any education system. Many research efforts have been conducted to facilitate the presentation modes used by instructors in classrooms as well as provide means for students to review lectures through web browsers. Other studies have been made to provide acoustical design recommendations for classrooms like room size and reverberation times. However, using acoustical features of classrooms as a way to provide education systems with feedback about the learning process was not thoroughly investigated in any of these studies. We propose a system that extracts different sound features of students and instructors, and then uses machine learning techniques to evaluate the acoustical quality of any learning environment. We infer conclusions about the students' satisfaction with the quality of lectures. Using classifiers instead of surveys and other subjective ways of measures can facilitate and speed such experiments which enables us to perform them continuously. We believe our system enables education systems to continuously review and improve their teaching strategies and acoustical quality of classrooms.
The way people respond to messaging from public health organizations on social media can provide insight into public perceptions on critical health issues, especially during a global crisis such as COVID-19. It could be valuable for high-impact organizations such as the US Centers for Disease Control and Prevention (CDC) or the World Health Organization (WHO) to understand how these perceptions impact reception of messaging on health policy recommendations. We collect two datasets of public health messages and their responses from Twitter relating to COVID-19 and Vaccines, and introduce a predictive method which can be used to explore the potential reception of such messages. Specifically, we harness a generative model (GPT-2) to directly predict probable future responses and demonstrate how it can be used to optimize expected reception of important health guidance. Finally, we introduce a novel evaluation scheme with extensive statistical testing which allows us to conclude that our models capture the semantics and sentiment found in actual public health responses.
This work addresses the efficiency concern on inferring a nonlinear contextual bandit when the number of arms $n$ is very large. We propose a neural bandit model with an end-to-end training process to efficiently perform bandit algorithms such as Thompson Sampling and UCB during inference. We advance state-of-the-art time complexity to $O(\log n)$ with approximate Bayesian inference, neural random feature mapping, approximate global maxima and approximate nearest neighbor search. We further propose a generative adversarial network to shift the bottleneck of maximizing the objective for selecting optimal arms from inference time to training time, enjoying significant speedup with additional advantage of enabling batch and parallel processing. %The generative model can inference an approximate argmax of the posterior sampling in logarithmic time complexity with the help of approximate nearest neighbor search. Extensive experiments on classification and recommendation tasks demonstrate order-of-magnitude improvement in inference time no significant degradation on the performance.