Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guoping Hu

Revisiting Pre-Trained Models for Chinese Natural Language Processing

Apr 29, 2020
Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, Guoping Hu

Figure 1 for Revisiting Pre-Trained Models for Chinese Natural Language Processing

Figure 2 for Revisiting Pre-Trained Models for Chinese Natural Language Processing

Figure 3 for Revisiting Pre-Trained Models for Chinese Natural Language Processing

Figure 4 for Revisiting Pre-Trained Models for Chinese Natural Language Processing

Bidirectional Encoder Representations from Transformers (BERT) has shown marvelous improvements across various NLP tasks, and various variants have been proposed to further improve the performance of the pre-trained models. In this paper, we target on revisiting Chinese pre-trained models to examine their effectiveness in a non-English language and release the Chinese pre-trained model series to the community. We also propose a simple but effective model called MacBERT, which improves upon RoBERTa in several ways, especially the masking strategy. We carried out extensive experiments on various Chinese NLP tasks, covering sentence-level to document-level, to revisit the existing pre-trained models as well as the proposed MacBERT. Experimental results show that MacBERT could achieve state-of-the-art performances on many NLP tasks, and we also ablate details with several findings that may help future research.

* 11 pages, as an extension to arXiv:1906.08101

Via

Access Paper or Ask Questions

Conversational Word Embedding for Retrieval-Based Dialog System

Apr 28, 2020
Wentao Ma, Yiming Cui, Ting Liu, Dong Wang, Shijin Wang, Guoping Hu

Figure 1 for Conversational Word Embedding for Retrieval-Based Dialog System

Figure 2 for Conversational Word Embedding for Retrieval-Based Dialog System

Figure 3 for Conversational Word Embedding for Retrieval-Based Dialog System

Figure 4 for Conversational Word Embedding for Retrieval-Based Dialog System

Human conversations contain many types of information, e.g., knowledge, common sense, and language habits. In this paper, we propose a conversational word embedding method named PR-Embedding, which utilizes the conversation pairs $ \left\langle{post, reply} \right\rangle$ to learn word embedding. Different from previous works, PR-Embedding uses the vectors from two different semantic spaces to represent the words in post and reply. To catch the information among the pair, we first introduce the word alignment model from statistical machine translation to generate the cross-sentence window, then train the embedding on word-level and sentence-level. We evaluate the method on single-turn and multi-turn response selection tasks for retrieval-based dialog systems. The experiment results show that PR-Embedding can improve the quality of the selected response. PR-Embedding source code is available at https://github.com/wtma/PR-Embedding

* To appear at ACL 2020

Via

Access Paper or Ask Questions

A Sentence Cloze Dataset for Chinese Machine Reading Comprehension

Apr 07, 2020
Yiming Cui, Ting Liu, Ziqing Yang, Zhipeng Chen, Wentao Ma, Wanxiang Che, Shijin Wang, Guoping Hu

Figure 1 for A Sentence Cloze Dataset for Chinese Machine Reading Comprehension

Figure 2 for A Sentence Cloze Dataset for Chinese Machine Reading Comprehension

Figure 3 for A Sentence Cloze Dataset for Chinese Machine Reading Comprehension

Figure 4 for A Sentence Cloze Dataset for Chinese Machine Reading Comprehension

Owing to the continuous contributions by the Chinese NLP community, more and more Chinese machine reading comprehension datasets become available, and they have been pushing Chinese MRC research forward. To add diversity in this area, in this paper, we propose a new task called Sentence Cloze-style Machine Reading Comprehension (SC-MRC). The proposed task aims to fill the right candidate sentence into the passage that has several blanks. Moreover, to add more difficulties, we also made fake candidates that are similar to the correct ones, which requires the machine to judge their correctness in the context. The proposed dataset contains over 100K blanks (questions) within over 10K passages, which was originated from Chinese narrative stories. To evaluate the dataset, we implement several baseline systems based on pre-trained models, and the results show that the state-of-the-art model still underperforms human performance by a large margin. We hope the release of the dataset could further accelerate the machine reading comprehension research. Resources available: https://github.com/ymcui/cmrc2019

* 6 pages

Via

Access Paper or Ask Questions

Is Graph Structure Necessary for Multi-hop Reasoning?

Apr 07, 2020
Nan Shao, Yiming Cui, Ting Liu, Shijin Wang, Guoping Hu

Figure 1 for Is Graph Structure Necessary for Multi-hop Reasoning?

Figure 2 for Is Graph Structure Necessary for Multi-hop Reasoning?

Figure 3 for Is Graph Structure Necessary for Multi-hop Reasoning?

Figure 4 for Is Graph Structure Necessary for Multi-hop Reasoning?

Recently, many works attempt to model texts as graph structure and introduce graph neural networks to deal with it on many NLP tasks.In this paper, we investigate whether graph structure is necessary for multi-hop reasoning tasks and what role it plays. Our analysis is centered on HotpotQA. We use the state-of-the-art published model, Dynamically Fused Graph Network (DFGN), as our baseline. By directly modifying the pre-trained model, our baseline model gains a large improvement and significantly surpass both published and unpublished works. Ablation experiments established that, with the proper use of pre-trained models, graph structure may not be necessary for multi-hop reasoning. We point out that both the graph structure and the adjacency matrix are task-related prior knowledge, and graph-attention can be considered as a special case of self-attention. Experiments demonstrate that graph-attention or the entire graph structure can be replaced by self-attention or Transformers, and achieve similar results to the previous state-of-the-art model achieved.

* 5 pages

Via

Access Paper or Ask Questions

TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing

Feb 28, 2020
Ziqing Yang, Yiming Cui, Zhipeng Chen, Wanxiang Che, Ting Liu, Shijin Wang, Guoping Hu

Figure 1 for TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing

Figure 2 for TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing

Figure 3 for TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing

Figure 4 for TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing

In this paper, we introduce TextBrewer, an open-source knowledge distillation toolkit designed for natural language processing. It works with different neural network models and supports various kinds of tasks, such as text classification, reading comprehension, sequence labeling. TextBrewer provides a simple and uniform workflow that enables quick setup of distillation experiments with highly flexible configurations. It offers a set of predefined distillation methods and can be extended with custom code. As a case study, we use TextBrewer to distill BERT on several typical NLP tasks. With simple configuration, we achieve results that are comparable with or even higher than the state-of-the-art performance. Our toolkit is available through: http://textbrewer.hfl-rc.com

* 8 pages

Via

Access Paper or Ask Questions

Discriminative Sentence Modeling for Story Ending Prediction

Dec 19, 2019
Yiming Cui, Wanxiang Che, Wei-Nan Zhang, Ting Liu, Shijin Wang, Guoping Hu

Figure 1 for Discriminative Sentence Modeling for Story Ending Prediction

Figure 2 for Discriminative Sentence Modeling for Story Ending Prediction

Figure 3 for Discriminative Sentence Modeling for Story Ending Prediction

Figure 4 for Discriminative Sentence Modeling for Story Ending Prediction

Story Ending Prediction is a task that needs to select an appropriate ending for the given story, which requires the machine to understand the story and sometimes needs commonsense knowledge. To tackle this task, we propose a new neural network called Diff-Net for better modeling the differences of each ending in this task. The proposed model could discriminate two endings in three semantic levels: contextual representation, story-aware representation, and discriminative representation. Experimental results on the Story Cloze Test dataset show that the proposed model siginificantly outperforms various systems by a large margin, and detailed ablation studies are given for better understanding our model. We also carefully examine the traditional and BERT-based models on both SCT v1.0 and v1.5 with interesting findings that may potentially help future studies.

* 8 pages, accepted as a conference paper at AAAI 2020

Via

Access Paper or Ask Questions

Contextual Recurrent Units for Cloze-style Reading Comprehension

Nov 14, 2019
Yiming Cui, Wei-Nan Zhang, Wanxiang Che, Ting Liu, Zhipeng Chen, Shijin Wang, Guoping Hu

Figure 1 for Contextual Recurrent Units for Cloze-style Reading Comprehension

Figure 2 for Contextual Recurrent Units for Cloze-style Reading Comprehension

Figure 3 for Contextual Recurrent Units for Cloze-style Reading Comprehension

Figure 4 for Contextual Recurrent Units for Cloze-style Reading Comprehension

Recurrent Neural Networks (RNN) are known as powerful models for handling sequential data, and especially widely utilized in various natural language processing tasks. In this paper, we propose Contextual Recurrent Units (CRU) for enhancing local contextual representations in neural networks. The proposed CRU injects convolutional neural networks (CNN) into the recurrent units to enhance the ability to model the local context and reducing word ambiguities even in bi-directional RNNs. We tested our CRU model on sentence-level and document-level modeling NLP tasks: sentiment classification and reading comprehension. Experimental results show that the proposed CRU model could give significant improvements over traditional CNN or RNN models, including bidirectional conditions, as well as various state-of-the-art systems on both tasks, showing its promising future of extensibility to other NLP tasks as well.

* 10 pages

Via

Access Paper or Ask Questions

Improving Machine Reading Comprehension via Adversarial Training

Nov 09, 2019
Ziqing Yang, Yiming Cui, Wanxiang Che, Ting Liu, Shijin Wang, Guoping Hu

Figure 1 for Improving Machine Reading Comprehension via Adversarial Training

Figure 2 for Improving Machine Reading Comprehension via Adversarial Training

Figure 3 for Improving Machine Reading Comprehension via Adversarial Training

Figure 4 for Improving Machine Reading Comprehension via Adversarial Training

Adversarial training (AT) as a regularization method has proved its effectiveness in various tasks, such as image classification and text classification. Though there are successful applications of AT in many tasks of natural language processing (NLP), the mechanism behind it is still unclear. In this paper, we aim to apply AT on machine reading comprehension (MRC) and study its effects from multiple perspectives. We experiment with three different kinds of RC tasks: span-based RC, span-based RC with unanswerable questions and multi-choice RC. The experimental results show that the proposed method can improve the performance significantly and universally on SQuAD1.1, SQuAD2.0 and RACE. With virtual adversarial training (VAT), we explore the possibility of improving the RC models with semi-supervised learning and prove that examples from a different task are also beneficial. We also find that AT helps little in defending against artificial adversarial examples, but AT helps the model to learn better on examples that contain more low-frequency words.

Via

Access Paper or Ask Questions

TripleNet: Triple Attention Network for Multi-Turn Response Selection in Retrieval-based Chatbots

Sep 29, 2019
Wentao Ma, Yiming Cui, Nan Shao, Su He, Wei-Nan Zhang, Ting Liu, Shijin Wang, Guoping Hu

Figure 1 for TripleNet: Triple Attention Network for Multi-Turn Response Selection in Retrieval-based Chatbots

Figure 2 for TripleNet: Triple Attention Network for Multi-Turn Response Selection in Retrieval-based Chatbots

Figure 3 for TripleNet: Triple Attention Network for Multi-Turn Response Selection in Retrieval-based Chatbots

Figure 4 for TripleNet: Triple Attention Network for Multi-Turn Response Selection in Retrieval-based Chatbots

We consider the importance of different utterances in the context for selecting the response usually depends on the current query. In this paper, we propose the model TripleNet to fully model the task with the triple <context, query, response> instead of <context, response> in previous works. The heart of TripleNet is a novel attention mechanism named triple attention to model the relationships within the triple at four levels. The new mechanism updates the representation for each element based on the attention with the other two concurrently and symmetrically. We match the triple <C, Q, R> centered on the response from char to context level for prediction. Experimental results on two large-scale multi-turn response selection datasets show that the proposed model can significantly outperform the state-of-the-art methods. TripleNet source code is available at https://github.com/wtma/TripleNet

* 10 pages, accepted as a conference paper at CoNLL 2019

Via

Access Paper or Ask Questions