Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ramesh Nallapati

Improving Text-to-SQL Semantic Parsing with Fine-grained Query Understanding

Sep 28, 2022

Jun Wang, Patrick Ng, Alexander Hanbo Li, Jiarong Jiang, Zhiguo Wang, Ramesh Nallapati, Bing Xiang, Sudipta Sengupta

Figure 1 for Improving Text-to-SQL Semantic Parsing with Fine-grained Query Understanding

Figure 2 for Improving Text-to-SQL Semantic Parsing with Fine-grained Query Understanding

Figure 3 for Improving Text-to-SQL Semantic Parsing with Fine-grained Query Understanding

Figure 4 for Improving Text-to-SQL Semantic Parsing with Fine-grained Query Understanding

Abstract:Most recent research on Text-to-SQL semantic parsing relies on either parser itself or simple heuristic based approach to understand natural language query (NLQ). When synthesizing a SQL query, there is no explicit semantic information of NLQ available to the parser which leads to undesirable generalization performance. In addition, without lexical-level fine-grained query understanding, linking between query and database can only rely on fuzzy string match which leads to suboptimal performance in real applications. In view of this, in this paper we present a general-purpose, modular neural semantic parsing framework that is based on token-level fine-grained query understanding. Our framework consists of three modules: named entity recognizer (NER), neural entity linker (NEL) and neural semantic parser (NSP). By jointly modeling query and database, NER model analyzes user intents and identifies entities in the query. NEL model links typed entities to schema and cell values in database. Parser model leverages available semantic information and linking results and synthesizes tree-structured SQL queries based on dynamically generated grammar. Experiments on SQUALL, a newly released semantic parsing dataset, show that we can achieve 56.8% execution accuracy on WikiTableQuestions (WTQ) test set, which outperforms the state-of-the-art model by 2.7%.

* EMNLP Industry Track 2022

Via

Access Paper or Ask Questions

Efficient Few-Shot Fine-Tuning for Opinion Summarization

May 08, 2022

Arthur Bražinskas, Ramesh Nallapati, Mohit Bansal, Markus Dreyer

Figure 1 for Efficient Few-Shot Fine-Tuning for Opinion Summarization

Figure 2 for Efficient Few-Shot Fine-Tuning for Opinion Summarization

Figure 3 for Efficient Few-Shot Fine-Tuning for Opinion Summarization

Figure 4 for Efficient Few-Shot Fine-Tuning for Opinion Summarization

Abstract:Abstractive summarization models are typically pre-trained on large amounts of generic texts, then fine-tuned on tens or hundreds of thousands of annotated samples. However, in opinion summarization, large annotated datasets of reviews paired with reference summaries are not available and would be expensive to create. This calls for fine-tuning methods robust to overfitting on small datasets. In addition, generically pre-trained models are often not accustomed to the specifics of customer reviews and, after fine-tuning, yield summaries with disfluencies and semantic mistakes. To address these problems, we utilize an efficient few-shot method based on adapters which, as we show, can easily store in-domain knowledge. Instead of fine-tuning the entire model, we add adapters and pre-train them in a task-specific way on a large corpus of unannotated customer reviews, using held-out reviews as pseudo summaries. Then, fine-tune the adapters on the small available human-annotated dataset. We show that this self-supervised adapter pre-training improves summary quality over standard fine-tuning by 2.0 and 1.3 ROUGE-L points on the Amazon and Yelp datasets, respectively. Finally, for summary personalization, we condition on aspect keyword queries, automatically created from generic datasets. In the same vein, we pre-train the adapters in a query-based manner on customer reviews and then fine-tune them on annotated datasets. This results in better-organized summary content reflected in improved coherence and fewer redundancies.

* NAACL Findings 2022

Via

Access Paper or Ask Questions

DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization

Mar 21, 2022

Zheng Li, Zijian Wang, Ming Tan, Ramesh Nallapati, Parminder Bhatia, Andrew Arnold, Bing Xiang, Dan Roth

Figure 1 for DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization

Figure 2 for DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization

Figure 3 for DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization

Figure 4 for DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization

Abstract:Large-scale pre-trained sequence-to-sequence models like BART and T5 achieve state-of-the-art performance on many generative NLP tasks. However, such models pose a great challenge in resource-constrained scenarios owing to their large memory requirements and high latency. To alleviate this issue, we propose to jointly distill and quantize the model, where knowledge is transferred from the full-precision teacher model to the quantized and distilled low-precision student model. Empirical analyses show that, despite the challenging nature of generative tasks, we were able to achieve a 16.5x model footprint compression ratio with little performance drop relative to the full-precision counterparts on multiple summarization and QA datasets. We further pushed the limit of compression ratio to 27.7x and presented the performance-efficiency trade-off for generative tasks using pre-trained models. To the best of our knowledge, this is the first work aiming to effectively distill and quantize sequence-to-sequence pre-trained models for language generation tasks.

* ACL 2022

Via

Access Paper or Ask Questions

Pairwise Supervised Contrastive Learning of Sentence Representations

Sep 12, 2021

Dejiao Zhang, Shang-Wen Li, Wei Xiao, Henghui Zhu, Ramesh Nallapati, Andrew O. Arnold, Bing Xiang

Figure 1 for Pairwise Supervised Contrastive Learning of Sentence Representations

Figure 2 for Pairwise Supervised Contrastive Learning of Sentence Representations

Figure 3 for Pairwise Supervised Contrastive Learning of Sentence Representations

Figure 4 for Pairwise Supervised Contrastive Learning of Sentence Representations

Abstract:Many recent successes in sentence representation learning have been achieved by simply fine-tuning on the Natural Language Inference (NLI) datasets with triplet loss or siamese loss. Nevertheless, they share a common weakness: sentences in a contradiction pair are not necessarily from different semantic categories. Therefore, optimizing the semantic entailment and contradiction reasoning objective alone is inadequate to capture the high-level semantic structure. The drawback is compounded by the fact that the vanilla siamese or triplet losses only learn from individual sentence pairs or triplets, which often suffer from bad local optima. In this paper, we propose PairSupCon, an instance discrimination based approach aiming to bridge semantic entailment and contradiction understanding with high-level categorical concept encoding. We evaluate PairSupCon on various downstream tasks that involve understanding sentence semantics at different granularities. We outperform the previous state-of-the-art method with $10\%$--$13\%$ averaged improvement on eight clustering tasks, and $5\%$--$6\%$ averaged improvement on seven semantic textual similarity (STS) tasks.

* 9 pages, EMNLP 2021

Via

Access Paper or Ask Questions

Improving Factual Consistency of Abstractive Summarization via Question Answering

May 10, 2021

Feng Nan, Cicero Nogueira dos Santos, Henghui Zhu, Patrick Ng, Kathleen McKeown, Ramesh Nallapati, Dejiao Zhang, Zhiguo Wang, Andrew O. Arnold, Bing Xiang

Figure 1 for Improving Factual Consistency of Abstractive Summarization via Question Answering

Figure 2 for Improving Factual Consistency of Abstractive Summarization via Question Answering

Figure 3 for Improving Factual Consistency of Abstractive Summarization via Question Answering

Figure 4 for Improving Factual Consistency of Abstractive Summarization via Question Answering

Abstract:A commonly observed problem with the state-of-the art abstractive summarization models is that the generated summaries can be factually inconsistent with the input documents. The fact that automatic summarization may produce plausible-sounding yet inaccurate summaries is a major concern that limits its wide application. In this paper we present an approach to address factual consistency in summarization. We first propose an efficient automatic evaluation metric to measure factual consistency; next, we propose a novel learning algorithm that maximizes the proposed metric during model training. Through extensive experiments, we confirm that our method is effective in improving factual consistency and even overall quality of the summaries, as judged by both automatic metrics and human evaluation.

* ACL-IJCNLP 2021

Via

Access Paper or Ask Questions

Transductive Learning for Abstractive News Summarization

Apr 17, 2021

Arthur Bražinskas, Mengwen Liu, Ramesh Nallapati, Sujith Ravi, Markus Dreyer

Figure 1 for Transductive Learning for Abstractive News Summarization

Figure 2 for Transductive Learning for Abstractive News Summarization

Figure 3 for Transductive Learning for Abstractive News Summarization

Figure 4 for Transductive Learning for Abstractive News Summarization

Abstract:Pre-trained language models have recently advanced abstractive summarization. These models are further fine-tuned on human-written references before summary generation in test time. In this work, we propose the first application of transductive learning to summarization. In this paradigm, a model can learn from the test set's input before inference. To perform transduction, we propose to utilize input document summarizing sentences to construct references for learning in test time. These sentences are often compressed and fused to form abstractive summaries and provide omitted details and additional context to the reader. We show that our approach yields state-of-the-art results on CNN/DM and NYT datasets. For instance, we achieve over 1 ROUGE-L point improvement on CNN/DM. Further, we show the benefits of transduction from older to more recent news. Finally, through human and automatic evaluation, we show that our summaries become more abstractive and coherent.

Via

Access Paper or Ask Questions

Supporting Clustering with Contrastive Learning

Mar 24, 2021

Dejiao Zhang, Feng Nan, Xiaokai Wei, Shangwen Li, Henghui Zhu, Kathleen McKeown, Ramesh Nallapati, Andrew Arnold, Bing Xiang

Figure 1 for Supporting Clustering with Contrastive Learning

Figure 2 for Supporting Clustering with Contrastive Learning

Figure 3 for Supporting Clustering with Contrastive Learning

Figure 4 for Supporting Clustering with Contrastive Learning

Abstract:Unsupervised clustering aims at discovering the semantic categories of data according to some distance measured in the representation space. However, different categories often overlap with each other in the representation space at the beginning of the learning process, which poses a significant challenge for distance-based clustering in achieving good separation between different categories. To this end, we propose Supporting Clustering with Contrastive Learning (SCCL) -- a novel framework to leverage contrastive learning to promote better separation. We assess the performance of SCCL on short text clustering and show that SCCL significantly advances the state-of-the-art results on most benchmark datasets with 3%-11% improvement on Accuracy and 4%-15% improvement on Normalized Mutual Information. Furthermore, our quantitative analysis demonstrates the effectiveness of SCCL in leveraging the strengths of both bottom-up instance discrimination and top-down clustering to achieve better intra-cluster and inter-cluster distances when evaluated with the ground truth cluster labels

* NAACL 2021

Via

Access Paper or Ask Questions

Entity-level Factual Consistency of Abstractive Text Summarization

Feb 18, 2021

Feng Nan, Ramesh Nallapati, Zhiguo Wang, Cicero Nogueira dos Santos, Henghui Zhu, Dejiao Zhang, Kathleen McKeown, Bing Xiang

Figure 1 for Entity-level Factual Consistency of Abstractive Text Summarization

Figure 2 for Entity-level Factual Consistency of Abstractive Text Summarization

Figure 3 for Entity-level Factual Consistency of Abstractive Text Summarization

Figure 4 for Entity-level Factual Consistency of Abstractive Text Summarization

Abstract:A key challenge for abstractive summarization is ensuring factual consistency of the generated summary with respect to the original document. For example, state-of-the-art models trained on existing datasets exhibit entity hallucination, generating names of entities that are not present in the source document. We propose a set of new metrics to quantify the entity-level factual consistency of generated summaries and we show that the entity hallucination problem can be alleviated by simply filtering the training data. In addition, we propose a summary-worthy entity classification task to the training process as well as a joint entity and summary generation approach, which yield further improvements in entity level metrics.

* EACL 2021

Via

Access Paper or Ask Questions

Answering Ambiguous Questions through Generative Evidence Fusion and Round-Trip Prediction

Nov 26, 2020

Yifan Gao, Henghui Zhu, Patrick Ng, Cicero Nogueira dos Santos, Zhiguo Wang, Feng Nan, Dejiao Zhang, Ramesh Nallapati, Andrew O. Arnold, Bing Xiang

Figure 1 for Answering Ambiguous Questions through Generative Evidence Fusion and Round-Trip Prediction

Figure 2 for Answering Ambiguous Questions through Generative Evidence Fusion and Round-Trip Prediction

Figure 3 for Answering Ambiguous Questions through Generative Evidence Fusion and Round-Trip Prediction

Figure 4 for Answering Ambiguous Questions through Generative Evidence Fusion and Round-Trip Prediction

Abstract:In open-domain question answering, questions are highly likely to be ambiguous because users may not know the scope of relevant topics when formulating them. Therefore, a system needs to find every possible interpretation of the question, and propose a set of disambiguated question-answer pairs. In this paper, we present a model that aggregates and combines evidence from multiple passages to generate question-answer pairs. Particularly, our model reads a large number of passages to find as many interpretations as possible. In addition, we propose a novel round-trip prediction approach to generate additional interpretations that our model fails to find in the first pass, and then verify and filter out the incorrect question-answer pairs to arrive at the final disambiguated output. On the recently introduced AmbigQA open-domain question answering dataset, our model, named Refuel, achieves a new state-of-the-art, outperforming the previous best model by a large margin. We also conduct comprehensive analyses to validate the effectiveness of our proposed round-trip prediction.

Via

Access Paper or Ask Questions

End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems

Oct 12, 2020

Siamak Shakeri, Cicero Nogueira dos Santos, Henry Zhu, Patrick Ng, Feng Nan, Zhiguo Wang, Ramesh Nallapati, Bing Xiang

Figure 1 for End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems

Figure 2 for End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems

Figure 3 for End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems

Figure 4 for End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems

Abstract:We propose an end-to-end approach for synthetic QA data generation. Our model comprises a single transformer-based encoder-decoder network that is trained end-to-end to generate both answers and questions. In a nutshell, we feed a passage to the encoder and ask the decoder to generate a question and an answer token-by-token. The likelihood produced in the generation process is used as a filtering score, which avoids the need for a separate filtering model. Our generator is trained by fine-tuning a pretrained LM using maximum likelihood estimation. The experimental results indicate significant improvements in the domain adaptation of QA models outperforming current state-of-the-art methods.

* EMNLP 2020

Via

Access Paper or Ask Questions