Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yitao Cai

Towards Document-Level Paraphrase Generation with Sentence Rewriting and Reordering

Sep 15, 2021

Zhe Lin, Yitao Cai, Xiaojun Wan

Figure 1 for Towards Document-Level Paraphrase Generation with Sentence Rewriting and Reordering

Figure 2 for Towards Document-Level Paraphrase Generation with Sentence Rewriting and Reordering

Figure 3 for Towards Document-Level Paraphrase Generation with Sentence Rewriting and Reordering

Figure 4 for Towards Document-Level Paraphrase Generation with Sentence Rewriting and Reordering

Abstract:Paraphrase generation is an important task in natural language processing. Previous works focus on sentence-level paraphrase generation, while ignoring document-level paraphrase generation, which is a more challenging and valuable task. In this paper, we explore the task of document-level paraphrase generation for the first time and focus on the inter-sentence diversity by considering sentence rewriting and reordering. We propose CoRPG (Coherence Relationship guided Paraphrase Generation), which leverages graph GRU to encode the coherence relationship graph and get the coherence-aware representation for each sentence, which can be used for re-arranging the multiple (possibly modified) input sentences. We create a pseudo document-level paraphrase dataset for training CoRPG. Automatic evaluation results show CoRPG outperforms several strong baseline models on the BERTScore and diversity scores. Human evaluation also shows our model can generate document paraphrase with more diversity and semantic preservation.

* Findings of EMNLP 2021

Via

Access Paper or Ask Questions

Making Better Use of Bilingual Information for Cross-Lingual AMR Parsing

Jun 09, 2021

Yitao Cai, Zhe Lin, Xiaojun Wan

Figure 1 for Making Better Use of Bilingual Information for Cross-Lingual AMR Parsing

Figure 2 for Making Better Use of Bilingual Information for Cross-Lingual AMR Parsing

Figure 3 for Making Better Use of Bilingual Information for Cross-Lingual AMR Parsing

Figure 4 for Making Better Use of Bilingual Information for Cross-Lingual AMR Parsing

Abstract:Abstract Meaning Representation (AMR) is a rooted, labeled, acyclic graph representing the semantics of natural language. As previous works show, although AMR is designed for English at first, it can also represent semantics in other languages. However, they find that concepts in their predicted AMR graphs are less specific. We argue that the misprediction of concepts is due to the high relevance between English tokens and AMR concepts. In this work, we introduce bilingual input, namely the translated texts as well as non-English texts, in order to enable the model to predict more accurate concepts. Besides, we also introduce an auxiliary task, requiring the decoder to predict the English sequences at the same time. The auxiliary task can help the decoder understand what exactly the corresponding English tokens are. Our proposed cross-lingual AMR parser surpasses previous state-of-the-art parser by 10.6 points on Smatch F1 score. The ablation study also demonstrates the efficacy of our proposed modules.

* Findings of ACL 2021

Via

Access Paper or Ask Questions

IGSQL: Database Schema Interaction Graph Based Neural Model for Context-Dependent Text-to-SQL Generation

Nov 11, 2020

Yitao Cai, Xiaojun Wan

Figure 1 for IGSQL: Database Schema Interaction Graph Based Neural Model for Context-Dependent Text-to-SQL Generation

Figure 2 for IGSQL: Database Schema Interaction Graph Based Neural Model for Context-Dependent Text-to-SQL Generation

Figure 3 for IGSQL: Database Schema Interaction Graph Based Neural Model for Context-Dependent Text-to-SQL Generation

Figure 4 for IGSQL: Database Schema Interaction Graph Based Neural Model for Context-Dependent Text-to-SQL Generation

Abstract:Context-dependent text-to-SQL task has drawn much attention in recent years. Previous models on context-dependent text-to-SQL task only concentrate on utilizing historical user inputs. In this work, in addition to using encoders to capture historical information of user inputs, we propose a database schema interaction graph encoder to utilize historicalal information of database schema items. In decoding phase, we introduce a gate mechanism to weigh the importance of different vocabularies and then make the prediction of SQL tokens. We evaluate our model on the benchmark SParC and CoSQL datasets, which are two large complex context-dependent cross-domain text-to-SQL datasets. Our model outperforms previous state-of-the-art model by a large margin and achieves new state-of-the-art results on the two datasets. The comparison and ablation results demonstrate the efficacy of our model and the usefulness of the database schema interaction graph encoder.

* EMNLP 2020

Via

Access Paper or Ask Questions