Alert button
Picture for Radu Florian

Radu Florian

Alert button

Slide, Constrain, Parse, Repeat: Synchronous SlidingWindows for Document AMR Parsing

May 26, 2023
Sadhana Kumaravel, Tahira Naseem, Ramon Fernandez Astudillo, Radu Florian, Salim Roukos

Figure 1 for Slide, Constrain, Parse, Repeat: Synchronous SlidingWindows for Document AMR Parsing
Figure 2 for Slide, Constrain, Parse, Repeat: Synchronous SlidingWindows for Document AMR Parsing
Figure 3 for Slide, Constrain, Parse, Repeat: Synchronous SlidingWindows for Document AMR Parsing

The sliding window approach provides an elegant way to handle contexts of sizes larger than the Transformer's input window, for tasks like language modeling. Here we extend this approach to the sequence-to-sequence task of document parsing. For this, we exploit recent progress in transition-based parsing to implement a parser with synchronous sliding windows over source and target. We develop an oracle and a parser for document-level AMR by expanding on Structured-BART such that it leverages source-target alignments and constrains decoding to guarantee synchronicity and consistency across overlapping windows. We evaluate our oracle and parser using the Abstract Meaning Representation (AMR) parsing 3.0 corpus. On the Multi-Sentence development set of AMR 3.0, we show that our transition oracle loses only 8\% of the gold cross-sentential links despite using a sliding window. In practice, this approach also results in a high-quality document-level parser with manageable memory requirements. Our proposed system performs on par with the state-of-the-art pipeline approach for document-level AMR parsing task on Multi-Sentence AMR 3.0 corpus while maintaining sentence-level parsing performance.

Viaarxiv icon

AMR Parsing with Instruction Fine-tuned Pre-trained Language Models

Apr 24, 2023
Young-Suk Lee, Ramón Fernandez Astudillo, Radu Florian, Tahira Naseem, Salim Roukos

Figure 1 for AMR Parsing with Instruction Fine-tuned Pre-trained Language Models
Figure 2 for AMR Parsing with Instruction Fine-tuned Pre-trained Language Models
Figure 3 for AMR Parsing with Instruction Fine-tuned Pre-trained Language Models
Figure 4 for AMR Parsing with Instruction Fine-tuned Pre-trained Language Models

Instruction fine-tuned language models on a collection of instruction annotated datasets (FLAN) have shown highly effective to improve model performance and generalization to unseen tasks. However, a majority of standard parsing tasks including abstract meaning representation (AMR), universal dependency (UD), semantic role labeling (SRL) has been excluded from the FLAN collections for both model training and evaluations. In this paper, we take one of such instruction fine-tuned pre-trained language models, i.e. FLAN-T5, and fine-tune them for AMR parsing. Our extensive experiments on various AMR parsing tasks including AMR2.0, AMR3.0 and BioAMR indicate that FLAN-T5 fine-tuned models out-perform previous state-of-the-art models across all tasks. In addition, full fine-tuning followed by the parameter efficient fine-tuning, LoRA, further improves the model performances, setting new state-of-the-arts in Smatch on AMR2.0 (86.4), AMR3.0 (84.9) and BioAMR (82.3).

Viaarxiv icon

UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers

Mar 01, 2023
Jon Saad-Falcon, Omar Khattab, Keshav Santhanam, Radu Florian, Martin Franz, Salim Roukos, Avirup Sil, Md Arafat Sultan, Christopher Potts

Figure 1 for UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers
Figure 2 for UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers
Figure 3 for UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers
Figure 4 for UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers

Many information retrieval tasks require large labeled datasets for fine-tuning. However, such datasets are often unavailable, and their utility for real-world applications can diminish quickly due to domain shifts. To address this challenge, we develop and motivate a method for using large language models (LLMs) to generate large numbers of synthetic queries cheaply. The method begins by generating a small number of synthetic queries using an expensive LLM. After that, a much less expensive one is used to create large numbers of synthetic queries, which are used to fine-tune a family of reranker models. These rerankers are then distilled into a single efficient retriever for use in the target domain. We show that this technique boosts zero-shot accuracy in long-tail domains, even where only 2K synthetic queries are used for fine-tuning, and that it achieves substantially lower latency than standard reranking methods. We make our end-to-end approach, including our synthetic datasets and replication code, publicly available on Github.

Viaarxiv icon

PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development

Jan 25, 2023
Avirup Sil, Jaydeep Sen, Bhavani Iyer, Martin Franz, Kshitij Fadnis, Mihaela Bornea, Sara Rosenthal, Scott McCarley, Rong Zhang, Vishwajeet Kumar, Yulong Li, Md Arafat Sultan, Riyaz Bhat, Radu Florian, Salim Roukos

Figure 1 for PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development
Figure 2 for PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development
Figure 3 for PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development
Figure 4 for PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development

The field of Question Answering (QA) has made remarkable progress in recent years, thanks to the advent of large pre-trained language models, newer realistic benchmark datasets with leaderboards, and novel algorithms for key components such as retrievers and readers. In this paper, we introduce PRIMEQA: a one-stop and open-source QA repository with an aim to democratize QA re-search and facilitate easy replication of state-of-the-art (SOTA) QA methods. PRIMEQA supports core QA functionalities like retrieval and reading comprehension as well as auxiliary capabilities such as question generation.It has been designed as an end-to-end toolkit for various use cases: building front-end applications, replicating SOTA methods on pub-lic benchmarks, and expanding pre-existing methods. PRIMEQA is available at : https://github.com/primeqa.

Viaarxiv icon

Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

Dec 02, 2022
Keshav Santhanam, Jon Saad-Falcon, Martin Franz, Omar Khattab, Avirup Sil, Radu Florian, Md Arafat Sultan, Salim Roukos, Matei Zaharia, Christopher Potts

Figure 1 for Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking
Figure 2 for Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking
Figure 3 for Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking
Figure 4 for Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks. Unfortunately, some dimensions of this progress are illusory: the majority of the popular IR benchmarks today focus exclusively on downstream task accuracy and thus conceal the costs incurred by systems that trade away efficiency for quality. Latency, hardware cost, and other efficiency considerations are paramount to the deployment of IR systems in user-facing settings. We propose that IR benchmarks structure their evaluation methodology to include not only metrics of accuracy, but also efficiency considerations such as a query latency and the corresponding cost budget for a reproducible hardware setting. For the popular IR benchmarks MS MARCO and XOR-TyDi, we show how the best choice of IR system varies according to how these efficiency considerations are chosen and weighed. We hope that future benchmarks will adopt these guidelines toward more holistic IR evaluation.

Viaarxiv icon

GAAMA 2.0: An Integrated System that Answers Boolean and Extractive Questions

Jun 21, 2022
Scott McCarley, Mihaela Bornea, Sara Rosenthal, Anthony Ferritto, Md Arafat Sultan, Avirup Sil, Radu Florian

Figure 1 for GAAMA 2.0: An Integrated System that Answers Boolean and Extractive Questions
Figure 2 for GAAMA 2.0: An Integrated System that Answers Boolean and Extractive Questions
Figure 3 for GAAMA 2.0: An Integrated System that Answers Boolean and Extractive Questions
Figure 4 for GAAMA 2.0: An Integrated System that Answers Boolean and Extractive Questions

Recent machine reading comprehension datasets include extractive and boolean questions but current approaches do not offer integrated support for answering both question types. We present a multilingual machine reading comprehension system and front-end demo that handles boolean questions by providing both a YES/NO answer and highlighting supporting evidence, and handles extractive questions by highlighting the answer in the passage. Our system, GAAMA 2.0, is ranked first on the Tydi QA leaderboard at the time of this writing. We contrast two different implementations of our approach. The first includes several independent stacks of transformers allowing easy deployment of each component. The second is a single stack of transformers utilizing adapters to reduce GPU memory footprint in a resource-constrained environment.

Viaarxiv icon

Not to Overfit or Underfit? A Study of Domain Generalization in Question Answering

May 15, 2022
Md Arafat Sultan, Avirup Sil, Radu Florian

Figure 1 for Not to Overfit or Underfit? A Study of Domain Generalization in Question Answering
Figure 2 for Not to Overfit or Underfit? A Study of Domain Generalization in Question Answering
Figure 3 for Not to Overfit or Underfit? A Study of Domain Generalization in Question Answering
Figure 4 for Not to Overfit or Underfit? A Study of Domain Generalization in Question Answering

Machine learning models are prone to overfitting their source (training) distributions, which is commonly believed to be why they falter in novel target domains. Here we examine the contrasting view that multi-source domain generalization (DG) is in fact a problem of mitigating source domain underfitting: models not adequately learning the signal in their multi-domain training data. Experiments on a reading comprehension DG benchmark show that as a model gradually learns its source domains better -- using known methods such as knowledge distillation from a larger model -- its zero-shot out-of-domain accuracy improves at an even faster rate. Improved source domain learning also demonstrates superior generalization over three popular domain-invariant learning methods that aim to counter overfitting.

Viaarxiv icon

Inducing and Using Alignments for Transition-based AMR Parsing

May 03, 2022
Andrew Drozdov, Jiawei Zhou, Radu Florian, Andrew McCallum, Tahira Naseem, Yoon Kim, Ramon Fernandez Astudillo

Figure 1 for Inducing and Using Alignments for Transition-based AMR Parsing
Figure 2 for Inducing and Using Alignments for Transition-based AMR Parsing
Figure 3 for Inducing and Using Alignments for Transition-based AMR Parsing

Transition-based parsers for Abstract Meaning Representation (AMR) rely on node-to-word alignments. These alignments are learned separately from parser training and require a complex pipeline of rule-based components, pre-processing, and post-processing to satisfy domain-specific constraints. Parsers also train on a point-estimate of the alignment pipeline, neglecting the uncertainty due to the inherent ambiguity of alignment. In this work we explore two avenues for overcoming these limitations. First, we propose a neural aligner for AMR that learns node-to-word alignments without relying on complex pipelines. We subsequently explore a tighter integration of aligner and parser training by considering a distribution over oracle action sequences arising from aligner uncertainty. Empirical results show this approach leads to more accurate alignments and generalization better from the AMR2.0 to AMR3.0 corpora. We attain a new state-of-the art for gold-only trained models, matching silver-trained performance without the need for beam search on AMR3.0.

* Accepted at NAACL 2022 
Viaarxiv icon

Synthetic Target Domain Supervision for Open Retrieval QA

Apr 20, 2022
Revanth Gangi Reddy, Bhavani Iyer, Md Arafat Sultan, Rong Zhang, Avirup Sil, Vittorio Castelli, Radu Florian, Salim Roukos

Figure 1 for Synthetic Target Domain Supervision for Open Retrieval QA
Figure 2 for Synthetic Target Domain Supervision for Open Retrieval QA
Figure 3 for Synthetic Target Domain Supervision for Open Retrieval QA
Figure 4 for Synthetic Target Domain Supervision for Open Retrieval QA

Neural passage retrieval is a new and promising approach in open retrieval question answering. In this work, we stress-test the Dense Passage Retriever (DPR) -- a state-of-the-art (SOTA) open domain neural retrieval model -- on closed and specialized target domains such as COVID-19, and find that it lags behind standard BM25 in this important real-world setting. To make DPR more robust under domain shift, we explore its fine-tuning with synthetic training examples, which we generate from unlabeled target domain text using a text-to-text generator. In our experiments, this noisy but fully automated target domain supervision gives DPR a sizable advantage over BM25 in out-of-domain settings, making it a more viable model in practice. Finally, an ensemble of BM25 and our improved DPR model yields the best results, further pushing the SOTA for open retrieval QA on multiple out-of-domain test sets.

* Published at SIGIR 2021 
Viaarxiv icon

A Generative Model for Relation Extraction and Classification

Feb 26, 2022
Jian Ni, Gaetano Rossiello, Alfio Gliozzo, Radu Florian

Figure 1 for A Generative Model for Relation Extraction and Classification
Figure 2 for A Generative Model for Relation Extraction and Classification
Figure 3 for A Generative Model for Relation Extraction and Classification
Figure 4 for A Generative Model for Relation Extraction and Classification

Relation extraction (RE) is an important information extraction task which provides essential information to many NLP applications such as knowledge base population and question answering. In this paper, we present a novel generative model for relation extraction and classification (which we call GREC), where RE is modeled as a sequence-to-sequence generation task. We explore various encoding representations for the source and target sequences, and design effective schemes that enable GREC to achieve state-of-the-art performance on three benchmark RE datasets. In addition, we introduce negative sampling and decoding scaling techniques which provide a flexible tool to tune the precision and recall performance of the model. Our approach can be extended to extract all relation triples from a sentence in one pass. Although the one-pass approach incurs certain performance loss, it is much more computationally efficient.

Viaarxiv icon