Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hai Zhao

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University

Cross-lingual Dependency Parsing as Domain Adaptation

Dec 24, 2020

Kailai Sun, Zuchao Li, Hai Zhao

Figure 1 for Cross-lingual Dependency Parsing as Domain Adaptation

Figure 2 for Cross-lingual Dependency Parsing as Domain Adaptation

Figure 3 for Cross-lingual Dependency Parsing as Domain Adaptation

Figure 4 for Cross-lingual Dependency Parsing as Domain Adaptation

Abstract:In natural language processing (NLP), cross-lingual transfer learning is as essential as in-domain learning due to the unavailability of annotated resources for low-resource languages. In this paper, we use the ability of a pre-training task that extracts universal features without supervision. We add two pre-training tasks as the auxiliary task into dependency parsing as multi-tasking, which improves the performance of the model in both in-domain and cross-lingual aspects. Moreover, inspired by the usefulness of self-training in cross-domain learning, we combine the traditional self-training and the two pre-training tasks. In this way, we can continuously extract universal features not only in training corpus but also in extra unannotated data and gain further improvement.

Via

Access Paper or Ask Questions

Reference Knowledgeable Network for Machine Reading Comprehension

Dec 07, 2020

Yilin Zhao, Zhuosheng Zhang, Hai Zhao

Figure 1 for Reference Knowledgeable Network for Machine Reading Comprehension

Figure 2 for Reference Knowledgeable Network for Machine Reading Comprehension

Figure 3 for Reference Knowledgeable Network for Machine Reading Comprehension

Figure 4 for Reference Knowledgeable Network for Machine Reading Comprehension

Abstract:Multi-choice Machine Reading Comprehension (MRC) is a major and challenging form of MRC tasks that requires model to select the most appropriate answer from a set of candidates given passage and question. Most of the existing researches focus on the modeling of the task datasets without explicitly referring to external fine-grained commonsense sources, which is a well-known challenge in multi-choice tasks. Thus we propose a novel reference-based knowledge enhancement model based on span extraction called Reference Knowledgeable Network (RekNet), which simulates human reading strategy to refine critical information from the passage and quote external knowledge in necessity. In detail, RekNet refines fine-grained critical information and defines it as Reference Span, then quotes external knowledge quadruples by the co-occurrence information of Reference Span and answer options. Our proposed method is evaluated on two multi-choice MRC benchmarks: RACE and DREAM, which shows remarkable performance improvement with observable statistical significance level over strong baselines.

* 12 pages, 7 figures

Via

Access Paper or Ask Questions

SJTU-NICT's Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation Task

Oct 11, 2020

Zuchao Li, Hai Zhao, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita

Figure 1 for SJTU-NICT's Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation Task

Figure 2 for SJTU-NICT's Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation Task

Figure 3 for SJTU-NICT's Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation Task

Abstract:In this paper, we introduced our joint team SJTU-NICT 's participation in the WMT 2020 machine translation shared task. In this shared task, we participated in four translation directions of three language pairs: English-Chinese, English-Polish on supervised machine translation track, German-Upper Sorbian on low-resource and unsupervised machine translation tracks. Based on different conditions of language pairs, we have experimented with diverse neural machine translation (NMT) techniques: document-enhanced NMT, XLM pre-trained language model enhanced NMT, bidirectional translation as a pre-training, reference language based UNMT, data-dependent gaussian prior objective, and BT-BLEU collaborative filtering self-training. We also used the TF-IDF algorithm to filter the training set to obtain a domain more similar set with the test set for finetuning. In our submissions, the primary systems won the first place on English to Chinese, Polish to English, and German to Upper Sorbian translation directions.

* WMT20

Via

Access Paper or Ask Questions

High-order Semantic Role Labeling

Oct 09, 2020

Zuchao Li, Hai Zhao, Rui Wang, Kevin Parnow

Figure 1 for High-order Semantic Role Labeling

Figure 2 for High-order Semantic Role Labeling

Figure 3 for High-order Semantic Role Labeling

Figure 4 for High-order Semantic Role Labeling

Abstract:Semantic role labeling is primarily used to identify predicates, arguments, and their semantic relationships. Due to the limitations of modeling methods and the conditions of pre-identified predicates, previous work has focused on the relationships between predicates and arguments and the correlations between arguments at most, while the correlations between predicates have been neglected for a long time. High-order features and structure learning were very common in modeling such correlations before the neural network era. In this paper, we introduce a high-order graph structure for the neural semantic role labeling model, which enables the model to explicitly consider not only the isolated predicate-argument pairs but also the interaction between the predicate-argument pairs. Experimental results on 7 languages of the CoNLL-2009 benchmark show that the high-order structural learning techniques are beneficial to the strong performing SRL models and further boost our baseline to achieve new state-of-the-art results.

* EMNLP 2020, ACL Findings

Via

Access Paper or Ask Questions

Topic-Aware Multi-turn Dialogue Modeling

Sep 26, 2020

Yi Xu, Hai Zhao, Zhuosheng Zhang

Figure 1 for Topic-Aware Multi-turn Dialogue Modeling

Figure 2 for Topic-Aware Multi-turn Dialogue Modeling

Figure 3 for Topic-Aware Multi-turn Dialogue Modeling

Figure 4 for Topic-Aware Multi-turn Dialogue Modeling

Abstract:In the retrieval-based multi-turn dialogue modeling, it remains a challenge to select the most appropriate response according to extracting salient features in context utterances. As a conversation goes on, topic shift at discourse-level naturally happens through the continuous multi-turn dialogue context. However, all known retrieval-based systems are satisfied with exploiting local topic words for context utterance representation but fail to capture such essential global topic-aware clues at discourse-level. Instead of taking topic-agnostic n-gram utterance as processing unit for matching purpose in existing systems, this paper presents a novel topic-aware solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way, so that the resulted model is capable of capturing salient topic shift at discourse-level in need and thus effectively track topic flow during multi-turn conversation. Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network, which matches each topic segment with the response in a dual cross-attention way. Experimental results on three public datasets show TADAM can outperform the state-of-the-art method by a large margin, especially by 3.4% on E-commerce dataset that has an obvious topic shift.

Via

Access Paper or Ask Questions

Document-level Neural Machine Translation with Document Embeddings

Sep 16, 2020

Shu Jiang, Hai Zhao, Zuchao Li, Bao-Liang Lu

Figure 1 for Document-level Neural Machine Translation with Document Embeddings

Figure 2 for Document-level Neural Machine Translation with Document Embeddings

Figure 3 for Document-level Neural Machine Translation with Document Embeddings

Figure 4 for Document-level Neural Machine Translation with Document Embeddings

Abstract:Standard neural machine translation (NMT) is on the assumption of document-level context independent. Most existing document-level NMT methods are satisfied with a smattering sense of brief document-level information, while this work focuses on exploiting detailed document-level context in terms of multiple forms of document embeddings, which is capable of sufficiently modeling deeper and richer document-level context. The proposed document-aware NMT is implemented to enhance the Transformer baseline by introducing both global and local document-level clues on the source end. Experiments show that the proposed method significantly improves the translation performance over strong baselines and other related studies.

* arXiv admin note: substantial text overlap with arXiv:1910.14528

Via

Access Paper or Ask Questions

Graph-to-Sequence Neural Machine Translation

Sep 16, 2020

Sufeng Duan, Hai Zhao, Rui Wang

Figure 1 for Graph-to-Sequence Neural Machine Translation

Figure 2 for Graph-to-Sequence Neural Machine Translation

Figure 3 for Graph-to-Sequence Neural Machine Translation

Figure 4 for Graph-to-Sequence Neural Machine Translation

Abstract:Neural machine translation (NMT) usually works in a seq2seq learning way by viewing either source or target sentence as a linear sequence of words, which can be regarded as a special case of graph, taking words in the sequence as nodes and relationships between words as edges. In the light of the current NMT models more or less capture graph information among the sequence in a latent way, we present a graph-to-sequence model facilitating explicit graph information capturing. In detail, we propose a graph-based SAN-based NMT model called Graph-Transformer by capturing information of subgraphs of different orders in every layers. Subgraphs are put into different groups according to their orders, and every group of subgraphs respectively reflect different levels of dependency between words. For fusing subgraph representations, we empirically explore three methods which weight different groups of subgraphs of different orders. Results of experiments on WMT14 English-German and IWSLT14 German-English show that our method can effectively boost the Transformer with an improvement of 1.1 BLEU points on WMT14 English-German dataset and 1.0 BLEU points on IWSLT14 German-English dataset.

Via

Access Paper or Ask Questions

Multi-span Style Extraction for Generative Reading Comprehension

Sep 15, 2020

Junjie Yang, Zhuosheng Zhang, Hai Zhao

Figure 1 for Multi-span Style Extraction for Generative Reading Comprehension

Figure 2 for Multi-span Style Extraction for Generative Reading Comprehension

Figure 3 for Multi-span Style Extraction for Generative Reading Comprehension

Figure 4 for Multi-span Style Extraction for Generative Reading Comprehension

Abstract:Generative machine reading comprehension (MRC) requires a model to generate well-formed answers. For this type of MRC, answer generation method is crucial to the model performance. However, generative models, which are supposed to be the right model for the task, in generally perform poorly. At the same time, single-span extraction models have been proven effective for extractive MRC, where the answer is constrained to a single span in the passage. Nevertheless, they generally suffer from generating incomplete answers or introducing redundant words when applied to the generative MRC. Thus, we extend the single-span extraction method to multi-span, proposing a new framework which enables generative MRC to be smoothly solved as multi-span extraction. Thorough experiments demonstrate that this novel approach can alleviate the dilemma between generative models and single-span models and produce answers with better-formed syntax and semantics. We will open-source our code for the research community.

Via

Access Paper or Ask Questions

Filling the Gap of Utterance-aware and Speaker-aware Representation for Multi-turn Dialogue

Sep 14, 2020

Longxiang Liu, Zhuosheng Zhang, Hai Zhao, Xi Zhou, Xiang Zhou

Figure 1 for Filling the Gap of Utterance-aware and Speaker-aware Representation for Multi-turn Dialogue

Figure 2 for Filling the Gap of Utterance-aware and Speaker-aware Representation for Multi-turn Dialogue

Figure 3 for Filling the Gap of Utterance-aware and Speaker-aware Representation for Multi-turn Dialogue

Figure 4 for Filling the Gap of Utterance-aware and Speaker-aware Representation for Multi-turn Dialogue

Abstract:A multi-turn dialogue is composed of multiple utterances from two or more different speaker roles. Thus utterance- and speaker-aware clues are supposed to be well captured in models. However, in the existing retrieval-based multi-turn dialogue modeling, the pre-trained language models (PrLMs) as encoder represent the dialogues coarsely by taking the pairwise dialogue history and candidate response as a whole, the hierarchical information on either utterance interrelation or speaker roles coupled in such representations is not well addressed. In this work, we propose a novel model to fill such a gap by modeling the effective utterance-aware and speaker-aware representations entailed in a dialogue history. In detail, we decouple the contextualized word representations by masking mechanisms in Transformer-based PrLM, making each word only focus on the words in current utterance, other utterances, two speaker roles (i.e., utterances of sender and utterances of receiver), respectively. Experimental results show that our method boosts the strong ELECTRA baseline substantially in four public benchmark datasets, and achieves various new state-of-the-art performance over previous methods. A series of ablation studies are conducted to demonstrate the effectiveness of our method.

* 9 pages, 2 figures, 9 tables

Via

Access Paper or Ask Questions

Composing Answer from Multi-spans for Reading Comprehension

Sep 14, 2020

Zhuosheng Zhang, Yiqing Zhang, Hai Zhao, Xi Zhou, Xiang Zhou

Figure 1 for Composing Answer from Multi-spans for Reading Comprehension

Figure 2 for Composing Answer from Multi-spans for Reading Comprehension

Figure 3 for Composing Answer from Multi-spans for Reading Comprehension

Figure 4 for Composing Answer from Multi-spans for Reading Comprehension

Abstract:This paper presents a novel method to generate answers for non-extraction machine reading comprehension (MRC) tasks whose answers cannot be simply extracted as one span from the given passages. Using a pointer network-style extractive decoder for such type of MRC may result in unsatisfactory performance when the ground-truth answers are given by human annotators or highly re-paraphrased from parts of the passages. On the other hand, using generative decoder cannot well guarantee the resulted answers with well-formed syntax and semantics when encountering long sentences. Therefore, to alleviate the obvious drawbacks of both sides, we propose an answer making-up method from extracted multi-spans that are learned by our model as highly confident $n$-gram candidates in the given passage. That is, the returned answers are composed of discontinuous multi-spans but not just one consecutive span in the given passages anymore. The proposed method is simple but effective: empirical experiments on MS MARCO show that the proposed method has a better performance on accurately generating long answers, and substantially outperforms two competitive typical one-span and Seq2Seq baseline decoders.

Via

Access Paper or Ask Questions