Machine learning and formal methods have complimentary benefits and drawbacks. In this work, we address the controller-design problem with a combination of techniques from both fields. The use of black-box neural networks in deep reinforcement learning (deep RL) poses a challenge for such a combination. Instead of reasoning formally about the output of deep RL, which we call the {\em wizard}, we extract from it a decision-tree based model, which we refer to as the {\em magic book}. Using the extracted model as an intermediary, we are able to handle problems that are infeasible for either deep RL or formal methods by themselves. First, we suggest, for the first time, combining a magic book in a synthesis procedure. We synthesize a stand-alone correct-by-design controller that enjoys the favorable performance of RL. Second, we incorporate a magic book in a bounded model checking (BMC) procedure. BMC allows us to find numerous traces of the plant under the control of the wizard, which a user can use to increase the trustworthiness of the wizard and direct further training.
When robots operate in the real-world, they need to handle uncertainties in sensing, acting, and the environment. Many tasks also require reasoning about long-term consequences of robot decisions. The partially observable Markov decision process (POMDP) offers a principled approach for planning under uncertainty. However, its computational complexity grows exponentially with the planning horizon. We propose to use temporally-extended macro-actions to cut down the effective planning horizon and thus the exponential factor of the complexity. We propose Macro-Action Generator-Critic (MAGIC), an algorithm that learns a macro-action generator from data, and uses the learned macro-actions to perform long-horizon planning. MAGIC learns the generator using experience provided by an online planner, and in-turn conditions the planner using the generated macro-actions. We evaluate MAGIC on several long-term planning tasks, showing that it significantly outperforms planning using primitive actions, hand-crafted macro-actions, as well as naive reinforcement learning in both simulation and on a real robot.
Magic is the art of producing in the spectator an illusion of impossibility. Although the scientific study of magic is in its infancy, the advent of recent tracking algorithms based on deep learning allow now to quantify the skills of the magician in naturalistic conditions at unprecedented resolution and robustness. In this study, we deconstructed stage magic into purely motor maneuvers and trained an artificial neural network (DeepLabCut) to follow coins as a professional magician made them appear and disappear in a series of tricks. Rather than using AI as a mere tracking tool, we conceived it as an "artificial spectator". When the coins were not visible, the algorithm was trained to infer their location as a human spectator would (i.e. in the left fist). This created situations where the human was fooled while AI (as seen by a human) was not, and vice versa. Magic from the perspective of the machine reveals our own cognitive biases.
Strategy card game is a well-known genre that is demanding on the intelligent game-play and can be an ideal test-bench for AI. Previous work combines an end-to-end policy function and an optimistic smooth fictitious play, which shows promising performances on the strategy card game Legend of Code and Magic. In this work, we apply such algorithms to Hearthstone, a famous commercial game that is more complicated in game rules and mechanisms. We further propose several improved techniques and consequently achieve significant progress. For a machine-vs-human test we invite a Hearthstone streamer whose best rank was top 10 of the official league in China region that is estimated to be of millions of players. Our models defeat the human player in all Best-of-5 tournaments of full games (including both deck building and battle), showing a strong capability of decision making.
A Nonlinear Model Predictive Control (NMPC) strategy aimed at controlling a small-scale car model for autonomous racing competitions is presented in this paper. The proposed control strategy is concerned with minimizing the lap time while keeping the vehicle within track boundaries. The optimization problem considers both the vehicle's actuation limits and the lateral and longitudinal forces acting on the car modeled through the Pacejka's magic formula and a simple drivetrain model. Furthermore, the approach allows to safely race on a track populated by static obstacles generating collision-free trajectories and tracking them while enhancing the lap timing performance. Gazebo simulations using the F1/10 simulator showcase the feasibility and validity of the proposed control strategy. The code is released as open-source making it possible to replicate the obtained results.
The current certification process for aerospace software is not adapted to "AI-based" algorithms such as deep neural networks. Unlike traditional aerospace software, the precise parameters optimized during neural network training are as important as (or more than) the code processing the network and they are not directly mathematically understandable. Despite their lack of explainability such algorithms are appealing because for some applications they can exhibit high performance unattainable with any traditional explicit line-by-line software methods. This paper proposes a framework and principles that could be used to establish certification methods for neural network models for which the current certification processes such as DO-178 cannot be applied. While it is not a magic recipe, it is a set of common sense steps that will allow the applicant and the regulator increase their confidence in the developed software, by demonstrating the capabilities to bring together, trace, and track the requirements, data, software, training process, and test results.
Magic sets are a Datalog to Datalog rewriting technique to optimize query answering. The rewritten program focuses on a portion of the stable model(s) of the input program which is sufficient to answer the given query. However, the rewriting may introduce new recursive definitions, which can involve even negation and aggregations, and may slow down program evaluation. This paper enhances the magic set technique by preventing the creation of (new) recursive definitions in the rewritten program. It turns out that the new version of magic sets is closed for Datalog programs with stratified negation and aggregations, which is very convenient to obtain efficient computation of the stable model of the rewritten program. Moreover, the rewritten program is further optimized by the elimination of subsumed rules and by the efficient handling of the cases where binding propagation is lost. The research was stimulated by a challenge on the exploitation of Datalog/\textsc{dlv} for efficient reasoning on large ontologies. All proposed techniques have been hence implemented in the \textsc{dlv} system, and tested for ontological reasoning, confirming their effectiveness. Under consideration for publication in Theory and Practice of Logic Programming.
We present Magic Layouts; a method for parsing screenshots or hand-drawn sketches of user interface (UI) layouts. Our core contribution is to extend existing detectors to exploit a learned structural prior for UI designs, enabling robust detection of UI components; buttons, text boxes and similar. Specifically we learn a prior over mobile UI layouts, encoding common spatial co-occurrence relationships between different UI components. Conditioning region proposals using this prior leads to performance gains on UI layout parsing for both hand-drawn UIs and app screenshots, which we demonstrate within the context an interactive application for rapidly acquiring digital prototypes of user experience (UX) designs.
Generative language models (LMs) such as GPT-2/3 can be prompted to generate text with remarkable quality. While they are designed for text-prompted generation, it remains an open question how the generation process could be guided by modalities beyond text such as images. In this work, we propose a training-free framework, called MAGIC (iMAge-Guided text generatIon with CLIP), for plugging in visual controls in the generation process and enabling LMs to perform multimodal tasks (e.g., image captioning) in a zero-shot manner. MAGIC is a simple yet efficient plug-and-play framework, which directly combines an off-the-shelf LM (i.e., GPT-2) and an image-text matching model (i.e., CLIP) for image-grounded text generation. During decoding, MAGIC influences the generation of the LM by introducing a CLIP-induced score, called magic score, which regularizes the generated result to be semantically related to a given image while being coherent to the previously generated context. Notably, the proposed decoding scheme does not involve any gradient update operation, therefore being computationally efficient. On the challenging task of zero-shot image captioning, MAGIC outperforms the state-of-the-art method by notable margins with a nearly 27 times decoding speedup. MAGIC is a flexible framework and is theoretically compatible with any text generation tasks that incorporate image grounding. In the experiments, we showcase that it is also capable of performing visually grounded story generation given both an image and a text prompt.
Pre-training and then fine-tuning large language models is commonly used to achieve state-of-the-art performance in natural language processing (NLP) tasks. However, most pre-trained models suffer from low inference speed. Deploying such large models to applications with latency constraints is challenging. In this work, we focus on accelerating the inference via conditional computations. To achieve this, we propose a novel idea, Magic Pyramid (MP), to reduce both width-wise and depth-wise computation via token pruning and early exiting for Transformer-based models, particularly BERT. The former manages to save the computation via removing non-salient tokens, while the latter can fulfill the computation reduction by terminating the inference early before reaching the final layer, if the exiting condition is met. Our empirical studies demonstrate that compared to previous state of arts, MP is not only able to achieve a speed-adjustable inference but also to surpass token pruning and early exiting by reducing up to 70% giga floating point operations (GFLOPs) with less than 0.5% accuracy drop. Token pruning and early exiting express distinctive preferences to sequences with different lengths. However, MP is capable of achieving an average of 8.06x speedup on two popular text classification tasks, regardless of the sizes of the inputs.