How can an end-user provide feedback if a deployed structured prediction model generates incorrect output? Our goal is to allow users to correct errors directly through interaction, without retraining, by giving feedback on the model's output. We create a dynamic memory architecture with a growing memory of feedbacks about errors in the output. Given a new, unseen input, our model can use feedback from a similar, past erroneous state. On a script generation task, we show empirically that the model learns to apply feedback effectively (up to 30 points improvement), while avoiding similar past mistakes after deployment (up to 10 points improvement on an unseen set). This is a first step towards strengthening deployed models, potentially broadening their utility.
How can an end-user provide feedback if a deployed structured prediction model generates inconsistent output, ignoring the structural complexity of human language? This is an emerging topic with recent progress in synthetic or constrained settings, and the next big leap would require testing and tuning models in real-world settings. We present a new dataset, Interscript, containing user feedback on a deployed model that generates complex everyday tasks. Interscript contains 8,466 data points -- the input is a possibly erroneous script and a user feedback, and the output is a modified script. We posit two use-cases of \ours that might significantly advance the state-of-the-art in interactive learning. The dataset is available at: https://github.com/allenai/interscript.
Defeasible reasoning is the mode of reasoning where conclusions can be overturned by taking into account new evidence. Existing cognitive science literature on defeasible reasoning suggests that a person forms a mental model of the problem scenario before answering questions. Our research goal asks whether neural models can similarly benefit from envisioning the question scenario before answering a defeasible query. Our approach is, given a question, to have a model first create a graph of relevant influences, and then leverage that graph as an additional input when answering the question. Our system, CURIOUS, achieves a new state-of-the-art on three different defeasible reasoning datasets. This result is significant as it illustrates that performance can be improved by guiding a system to "think about" a question and explicitly model the scenario, rather than answering reflexively. Code, data, and pre-trained models are located at https://github.com/madaan/thinkaboutit.
The landscape of city-wide mobility behaviour has altered significantly over the past 18 months. The ability to make accurate and reliable predictions on such behaviour has likewise changed drastically with COVID-19 measures impacting how populations across the world interact with the different facets of mobility. This raises the question: "How does one use an abundance of pre-covid mobility data to make predictions on future behaviour in a present/post-covid environment?" This paper seeks to address this question by introducing an approach for traffic frame prediction using a lightweight Dual-Encoding U-Net built using only 12 Convolutional layers that incorporates a novel approach to skip-connections between Convolutional LSTM layers. This approach combined with an intuitive handling of training data can model both a temporal and spatio-temporal domain shift (gitlab.com/alchera/alchera-traffic4cast-2021).
Current Open-Domain Question Answering (ODQA) model paradigm often contains a retrieving module and a reading module. Given an input question, the reading module predicts the answer from the relevant passages which are retrieved by the retriever. The recent proposed Fusion-in-Decoder (FiD), which is built on top of the pretrained generative model T5, achieves the state-of-the-art performance in the reading module. Although being effective, it remains constrained by inefficient attention on all retrieved passages which contain a lot of noise. In this work, we propose a novel method KG-FiD, which filters noisy passages by leveraging the structural relationship among the retrieved passages with a knowledge graph. We initiate the passage node embedding from the FiD encoder and then use graph neural network (GNN) to update the representation for reranking. To improve the efficiency, we build the GNN on top of the intermediate layer output of the FiD encoder and only pass a few top reranked passages into the higher layers of encoder and decoder for answer generation. We also apply the proposed GNN based reranking method to enhance the passage retrieval results in the retrieving module. Extensive experiments on common ODQA benchmark datasets (Natural Question and TriviaQA) demonstrate that KG-FiD can improve vanilla FiD by up to 1.5% on answer exact match score and achieve comparable performance with FiD with only 40% of computation cost.
Defeasible reasoning is the mode of reasoning where conclusions can be overturned by taking into account new evidence. A commonly used method in cognitive science and logic literature is to handcraft argumentation supporting inference graphs. While humans find inference graphs very useful for reasoning, constructing them at scale is difficult. In this paper, we automatically generate such inference graphs through transfer learning from another NLP task that shares the kind of reasoning that inference graphs support. Through automated metrics and human evaluation, we find that our method generates meaningful graphs for the defeasible inference task. Human accuracy on this task improves by 20% by consulting the generated graphs. Our findings open up exciting new research avenues for cases where machine reasoning can help human reasoning. (A dataset of 230,000 influence graphs for each defeasible query is located at: https://tinyurl.com/defeasiblegraphs.)
A class of explainable NLP models for reasoning tasks support their decisions by generating free-form or structured explanations, but what happens when these supporting structures contain errors? Our goal is to allow users to interactively correct explanation structures through natural language feedback. We introduce MERCURIE - an interactive system that refines its explanations for a given reasoning task by getting human feedback in natural language. Our approach generates graphs that have 40% fewer inconsistencies as compared with the off-the-shelf system. Further, simply appending the corrected explanation structures to the output leads to a gain of 1.2 points on accuracy on defeasible reasoning across all three domains. We release a dataset of over 450k graphs for defeasible reasoning generated by our system at https://tinyurl.com/mercurie .
Different from traditional knowledge graphs (KGs) where facts are represented as entity-relation-entity triplets, hyper-relational KGs (HKGs) allow triplets to be associated with additional relation-entity pairs (a.k.a qualifiers) to convey more complex information. How to effectively and efficiently model the triplet-qualifier relationship for prediction tasks such as HKG completion is an open challenge for research. This paper proposes to improve the best-performing method in HKG completion, namely STARE, by introducing two novel revisions: (1) Replacing the computation-heavy graph neural network module with light-weight entity/relation embedding processing techniques for efficiency improvement without sacrificing effectiveness; (2) Adding a qualifier-oriented auxiliary training task for boosting the prediction power of our approach on HKG completion. The proposed approach consistently outperforms STARE in our experiments on three benchmark datasets, with significantly improved computational efficiency.
Acquiring dynamics is an essential topic in robot learning, but up-to-date methods, such as dynamics randomization, need to restart to check nominal parameters, generate simulation data, and train networks whenever they face different robots. To improve it, we novelly investigate general robot dynamics, its inverse models, and Gen2Real, which means transferring to reality. Our motivations are to build a model that learns the intrinsic dynamics of various robots and lower the threshold of dynamics learning by enabling an amateur to obtain robot models without being trapped in details. This paper achieves the "generality" by randomizing dynamics parameters, topology configurations, and model dimensions, which in sequence cover the property, the connection, and the number of robot links. A structure modified from GPT is applied to access the pre-training model of general dynamics. We also study various inverse models of dynamics to facilitate different applications. We step further to investigate a new concept, "Gen2Real", to transfer simulated, general models to physical, specific robots. Simulation and experiment results demonstrate the validity of the proposed models and method.\footnote{ These authors contribute equally.