Alert button
Picture for Marco Damonte

Marco Damonte

Alert button

CLASP: Few-Shot Cross-Lingual Data Augmentation for Semantic Parsing

Oct 14, 2022
Andy Rosenbaum, Saleh Soltan, Wael Hamza, Amir Saffari, Marco Damonte, Isabel Groves

Figure 1 for CLASP: Few-Shot Cross-Lingual Data Augmentation for Semantic Parsing
Figure 2 for CLASP: Few-Shot Cross-Lingual Data Augmentation for Semantic Parsing
Figure 3 for CLASP: Few-Shot Cross-Lingual Data Augmentation for Semantic Parsing
Figure 4 for CLASP: Few-Shot Cross-Lingual Data Augmentation for Semantic Parsing

A bottleneck to developing Semantic Parsing (SP) models is the need for a large volume of human-labeled training data. Given the complexity and cost of human annotation for SP, labeled data is often scarce, particularly in multilingual settings. Large Language Models (LLMs) excel at SP given only a few examples, however LLMs are unsuitable for runtime systems which require low latency. In this work, we propose CLASP, a simple method to improve low-resource SP for moderate-sized models: we generate synthetic data from AlexaTM 20B to augment the training set for a model 40x smaller (500M parameters). We evaluate on two datasets in low-resource settings: English PIZZA, containing either 348 or 16 real examples, and mTOP cross-lingual zero-shot, where training data is available only in English, and the model must generalize to four new languages. On both datasets, we show significant improvements over strong baseline methods.

* Accepted to AACL-IJCNLP 2022: The 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, November 20-23, 2022. See https://www.aacl2022.org/ 
Viaarxiv icon

One Semantic Parser to Parse Them All: Sequence to Sequence Multi-Task Learning on Semantic Parsing Datasets

Jun 14, 2021
Marco Damonte, Emilio Monti

Figure 1 for One Semantic Parser to Parse Them All: Sequence to Sequence Multi-Task Learning on Semantic Parsing Datasets
Figure 2 for One Semantic Parser to Parse Them All: Sequence to Sequence Multi-Task Learning on Semantic Parsing Datasets
Figure 3 for One Semantic Parser to Parse Them All: Sequence to Sequence Multi-Task Learning on Semantic Parsing Datasets
Figure 4 for One Semantic Parser to Parse Them All: Sequence to Sequence Multi-Task Learning on Semantic Parsing Datasets

Semantic parsers map natural language utterances to meaning representations. The lack of a single standard for meaning representations led to the creation of a plethora of semantic parsing datasets. To unify different datasets and train a single model for them, we investigate the use of Multi-Task Learning (MTL) architectures. We experiment with five datasets (Geoquery, NLMaps, TOP, Overnight, AMR). We find that an MTL architecture that shares the entire network across datasets yields competitive or better parsing accuracies than the single-task baselines, while reducing the total number of parameters by 68%. We further provide evidence that MTL has also better compositional generalization than single-task models. We also present a comparison of task sampling methods and propose a competitive alternative to widespread proportional sampling strategies.

* The Tenth Joint Conference on Lexical and Computational Semantics (*SEM 2021)  
Viaarxiv icon

Structural Neural Encoders for AMR-to-text Generation

May 20, 2019
Marco Damonte, Shay B. Cohen

Figure 1 for Structural Neural Encoders for AMR-to-text Generation
Figure 2 for Structural Neural Encoders for AMR-to-text Generation
Figure 3 for Structural Neural Encoders for AMR-to-text Generation
Figure 4 for Structural Neural Encoders for AMR-to-text Generation

AMR-to-text generation is a problem recently introduced to the NLP community, in which the goal is to generate sentences from Abstract Meaning Representation (AMR) graphs. Sequence-to-sequence models can be used to this end by converting the AMR graphs to strings. Approaching the problem while working directly with graphs requires the use of graph-to-sequence models that encode the AMR graph into a vector representation. Such encoding has been shown to be beneficial in the past, and unlike sequential encoding, it allows us to explicitly capture reentrant structures in the AMR graphs. We investigate the extent to which reentrancies (nodes with multiple parents) have an impact on AMR-to-text generation by comparing graph encoders to tree encoders, where reentrancies are not preserved. We show that improvements in the treatment of reentrancies and long-range dependencies contribute to higher overall scores for graph encoders. Our best model achieves 24.40 BLEU on LDC2015E86, outperforming the state of the art by 1.1 points and 24.54 BLEU on LDC2017T10, outperforming the state of the art by 1.24 points.

* Proceedings of NAACL 2019 
Viaarxiv icon

Practical Semantic Parsing for Spoken Language Understanding

Mar 19, 2019
Marco Damonte, Rahul Goel, Tagyoung Chung

Figure 1 for Practical Semantic Parsing for Spoken Language Understanding
Figure 2 for Practical Semantic Parsing for Spoken Language Understanding
Figure 3 for Practical Semantic Parsing for Spoken Language Understanding
Figure 4 for Practical Semantic Parsing for Spoken Language Understanding

Executable semantic parsing is the task of converting natural language utterances into logical forms that can be directly used as queries to get a response. We build a transfer learning framework for executable semantic parsing. We show that the framework is effective for Question Answering (Q&A) as well as for Spoken Language Understanding (SLU). We further investigate the case where a parser on a new domain can be learned by exploiting data on other domains, either via multi-task learning between the target domain and an auxiliary domain or via pre-training on the auxiliary domain and fine-tuning on the target domain. With either flavor of transfer learning, we are able to improve performance on most domains; we experiment with public data sets such as Overnight and NLmaps as well as with commercial SLU data. The experiments carried out on data sets that are different in nature show how executable semantic parsing can unify different areas of NLP such as Q&A and SLU.

* Proceedings of NAACL 2019  
Viaarxiv icon

Talking to myself: self-dialogues as data for conversational agents

Sep 19, 2018
Joachim Fainberg, Ben Krause, Mihai Dobre, Marco Damonte, Emmanuel Kahembwe, Daniel Duma, Bonnie Webber, Federico Fancellu

Figure 1 for Talking to myself: self-dialogues as data for conversational agents
Figure 2 for Talking to myself: self-dialogues as data for conversational agents
Figure 3 for Talking to myself: self-dialogues as data for conversational agents
Figure 4 for Talking to myself: self-dialogues as data for conversational agents

Conversational agents are gaining popularity with the increasing ubiquity of smart devices. However, training agents in a data driven manner is challenging due to a lack of suitable corpora. This paper presents a novel method for gathering topical, unstructured conversational data in an efficient way: self-dialogues through crowd-sourcing. Alongside this paper, we include a corpus of 3.6 million words across 23 topics. We argue the utility of the corpus by comparing self-dialogues with standard two-party conversations as well as data from other corpora.

* 5 pages, 5 pages appendix, 2 figures 
Viaarxiv icon

Edina: Building an Open Domain Socialbot with Self-dialogues

Sep 28, 2017
Ben Krause, Marco Damonte, Mihai Dobre, Daniel Duma, Joachim Fainberg, Federico Fancellu, Emmanuel Kahembwe, Jianpeng Cheng, Bonnie Webber

Figure 1 for Edina: Building an Open Domain Socialbot with Self-dialogues
Figure 2 for Edina: Building an Open Domain Socialbot with Self-dialogues
Figure 3 for Edina: Building an Open Domain Socialbot with Self-dialogues
Figure 4 for Edina: Building an Open Domain Socialbot with Self-dialogues

We present Edina, the University of Edinburgh's social bot for the Amazon Alexa Prize competition. Edina is a conversational agent whose responses utilize data harvested from Amazon Mechanical Turk (AMT) through an innovative new technique we call self-dialogues. These are conversations in which a single AMT Worker plays both participants in a dialogue. Such dialogues are surprisingly natural, efficient to collect and reflective of relevant and/or trending topics. These self-dialogues provide training data for a generative neural network as well as a basis for soft rules used by a matching score component. Each match of a soft rule against a user utterance is associated with a confidence score which we show is strongly indicative of reply quality, allowing this component to self-censor and be effectively integrated with other components. Edina's full architecture features a rule-based system backing off to a matching score, backing off to a generative neural network. Our hybrid data-driven methodology thus addresses both coverage limitations of a strictly rule-based approach and the lack of guarantees of a strictly machine-learning approach.

* 10 pages; submitted to the 1st Proceedings of the Alexa Prize 
Viaarxiv icon

An Incremental Parser for Abstract Meaning Representation

Apr 10, 2017
Marco Damonte, Shay B. Cohen, Giorgio Satta

Figure 1 for An Incremental Parser for Abstract Meaning Representation
Figure 2 for An Incremental Parser for Abstract Meaning Representation
Figure 3 for An Incremental Parser for Abstract Meaning Representation
Figure 4 for An Incremental Parser for Abstract Meaning Representation

Meaning Representation (AMR) is a semantic representation for natural language that embeds annotations related to traditional tasks such as named entity recognition, semantic role labeling, word sense disambiguation and co-reference resolution. We describe a transition-based parser for AMR that parses sentences left-to-right, in linear time. We further propose a test-suite that assesses specific subtasks that are helpful in comparing AMR parsers, and show that our parser is competitive with the state of the art on the LDC2015E86 dataset and that it outperforms state-of-the-art parsers for recovering named entities and handling polarity.

* EACL 2017 
Viaarxiv icon