Alert button
Picture for Guoping Hu

Guoping Hu

Alert button

JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving

Jun 19, 2023
Wayne Xin Zhao, Kun Zhou, Beichen Zhang, Zheng Gong, Zhipeng Chen, Yuanhang Zhou, Ji-Rong Wen, Jing Sha, Shijin Wang, Cong Liu, Guoping Hu

Figure 1 for JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving
Figure 2 for JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving
Figure 3 for JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving
Figure 4 for JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving

Although pre-trained language models~(PLMs) have recently advanced the research progress in mathematical reasoning, they are not specially designed as a capable multi-task solver, suffering from high cost for multi-task deployment (\eg a model copy for a task) and inferior performance on complex mathematical problems in practical applications. To address these issues, in this paper, we propose \textbf{JiuZhang~2.0}, a unified Chinese PLM specially for multi-task mathematical problem solving. Our idea is to maintain a moderate-sized model and employ the \emph{cross-task knowledge sharing} to improve the model capacity in a multi-task setting. Specially, we construct a Mixture-of-Experts~(MoE) architecture for modeling mathematical text, so as to capture the common mathematical knowledge across tasks. For optimizing the MoE architecture, we design \emph{multi-task continual pre-training} and \emph{multi-task fine-tuning} strategies for multi-task adaptation. These training strategies can effectively decompose the knowledge from the task data and establish the cross-task sharing via expert networks. In order to further improve the general capacity of solving different complex tasks, we leverage large language models~(LLMs) as complementary models to iteratively refine the generated solution by our PLM, via in-context learning. Extensive experiments have demonstrated the effectiveness of our model.

* Accepted by KDD 2023 ADS track, the 2.0 version of JiuZhang (arxiv:2206.06315v1) 
Viaarxiv icon

MADNet: Maximizing Addressee Deduction Expectation for Multi-Party Conversation Generation

May 22, 2023
Jia-Chen Gu, Chao-Hong Tan, Caiyuan Chu, Zhen-Hua Ling, Chongyang Tao, Quan Liu, Cong Liu, Guoping Hu

Figure 1 for MADNet: Maximizing Addressee Deduction Expectation for Multi-Party Conversation Generation
Figure 2 for MADNet: Maximizing Addressee Deduction Expectation for Multi-Party Conversation Generation
Figure 3 for MADNet: Maximizing Addressee Deduction Expectation for Multi-Party Conversation Generation
Figure 4 for MADNet: Maximizing Addressee Deduction Expectation for Multi-Party Conversation Generation

Modeling multi-party conversations (MPCs) with graph neural networks has been proven effective at capturing complicated and graphical information flows. However, existing methods rely heavily on the necessary addressee labels and can only be applied to an ideal setting where each utterance must be tagged with an addressee label. To study the scarcity of addressee labels which is a common issue in MPCs, we propose MADNet that maximizes addressee deduction expectation in heterogeneous graph neural networks for MPC generation. Given an MPC with a few addressee labels missing, existing methods fail to build a consecutively connected conversation graph, but only a few separate conversation fragments instead. To ensure message passing between these conversation fragments, four additional types of latent edges are designed to complete a fully-connected graph. Besides, to optimize the edge-type-dependent message passing for those utterances without addressee labels, an Expectation-Maximization-based method that iteratively generates silver addressee labels (E step), and optimizes the quality of generated responses (M step), is designed. Experimental results on two Ubuntu IRC channel benchmarks show that MADNet outperforms various baseline models on the task of MPC generation, especially under the more common and challenging setting where part of addressee labels are missing.

* Work in Progress. arXiv admin note: text overlap with arXiv:2203.08500 
Viaarxiv icon

SHINE: Syntax-augmented Hierarchical Interactive Encoder for Zero-shot Cross-lingual Information Extraction

May 21, 2023
Jun-Yu Ma, Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, Cong Liu, Guoping Hu

Figure 1 for SHINE: Syntax-augmented Hierarchical Interactive Encoder for Zero-shot Cross-lingual Information Extraction
Figure 2 for SHINE: Syntax-augmented Hierarchical Interactive Encoder for Zero-shot Cross-lingual Information Extraction
Figure 3 for SHINE: Syntax-augmented Hierarchical Interactive Encoder for Zero-shot Cross-lingual Information Extraction
Figure 4 for SHINE: Syntax-augmented Hierarchical Interactive Encoder for Zero-shot Cross-lingual Information Extraction

Zero-shot cross-lingual information extraction(IE) aims at constructing an IE model for some low-resource target languages, given annotations exclusively in some rich-resource languages. Recent studies based on language-universal features have shown their effectiveness and are attracting increasing attention. However, prior work has neither explored the potential of establishing interactions between language-universal features and contextual representations nor incorporated features that can effectively model constituent span attributes and relationships between multiple spans. In this study, a syntax-augmented hierarchical interactive encoder (SHINE) is proposed to transfer cross-lingual IE knowledge. The proposed encoder is capable of interactively capturing complementary information between features and contextual information, to derive language-agnostic representations for various IE tasks. Concretely, a multi-level interaction network is designed to hierarchically interact the complementary information to strengthen domain adaptability. Besides, in addition to the well-studied syntax features of part-of-speech and dependency relation, a new syntax feature of constituency structure is introduced to model the constituent span information which is crucial for IE. Experiments across seven languages on three IE tasks and four benchmarks verify the effectiveness and generalization ability of the proposed method.

* 15pages 
Viaarxiv icon

GIFT: Graph-Induced Fine-Tuning for Multi-Party Conversation Understanding

May 17, 2023
Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, Cong Liu, Guoping Hu

Figure 1 for GIFT: Graph-Induced Fine-Tuning for Multi-Party Conversation Understanding
Figure 2 for GIFT: Graph-Induced Fine-Tuning for Multi-Party Conversation Understanding
Figure 3 for GIFT: Graph-Induced Fine-Tuning for Multi-Party Conversation Understanding
Figure 4 for GIFT: Graph-Induced Fine-Tuning for Multi-Party Conversation Understanding

Addressing the issues of who saying what to whom in multi-party conversations (MPCs) has recently attracted a lot of research attention. However, existing methods on MPC understanding typically embed interlocutors and utterances into sequential information flows, or utilize only the superficial of inherent graph structures in MPCs. To this end, we present a plug-and-play and lightweight method named graph-induced fine-tuning (GIFT) which can adapt various Transformer-based pre-trained language models (PLMs) for universal MPC understanding. In detail, the full and equivalent connections among utterances in regular Transformer ignore the sparse but distinctive dependency of an utterance on another in MPCs. To distinguish different relationships between utterances, four types of edges are designed to integrate graph-induced signals into attention mechanisms to refine PLMs originally designed for processing sequential texts. We evaluate GIFT by implementing it into three PLMs, and test the performance on three downstream tasks including addressee recognition, speaker identification and response selection. Experimental results show that GIFT can significantly improve the performance of three PLMs on three downstream tasks and two benchmarks with only 4 additional parameters per encoding layer, achieving new state-of-the-art performance on MPC understanding.

* Accepted by ACL 2023. arXiv admin note: substantial text overlap with arXiv:2106.01541 
Viaarxiv icon

Multi-Stage Coarse-to-Fine Contrastive Learning for Conversation Intent Induction

Mar 09, 2023
Caiyuan Chu, Ya Li, Yifan Liu, Jia-Chen Gu, Quan Liu, Yongxin Ge, Guoping Hu

Figure 1 for Multi-Stage Coarse-to-Fine Contrastive Learning for Conversation Intent Induction
Figure 2 for Multi-Stage Coarse-to-Fine Contrastive Learning for Conversation Intent Induction
Figure 3 for Multi-Stage Coarse-to-Fine Contrastive Learning for Conversation Intent Induction
Figure 4 for Multi-Stage Coarse-to-Fine Contrastive Learning for Conversation Intent Induction

Intent recognition is critical for task-oriented dialogue systems. However, for emerging domains and new services, it is difficult to accurately identify the key intent of a conversation due to time-consuming data annotation and comparatively poor model transferability. Therefore, the automatic induction of dialogue intention is very important for intelligent dialogue systems. This paper presents our solution to Track 2 of Intent Induction from Conversations for Task-Oriented Dialogue at the Eleventh Dialogue System Technology Challenge (DSTC11). The essence of intention clustering lies in distinguishing the representation of different dialogue utterances. The key to automatic intention induction is that, for any given set of new data, the sentence representation obtained by the model can be well distinguished from different labels. Therefore, we propose a multi-stage coarse-to-fine contrastive learning model training scheme including unsupervised contrastive learning pre-training, supervised contrastive learning pre-training, and fine-tuning with joint contrastive learning and clustering to obtain a better dialogue utterance representation model for the clustering task. In the released DSTC11 Track 2 evaluation results, our proposed system ranked first on both of the two subtasks of this Track.

* Ranked 1st on Track 2 at DSTC 11, Accepted by DSTC 11 Workshop 
Viaarxiv icon

SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary

Oct 12, 2021
Jiaan Wang, Zhixu Li, Qiang Yang, Jianfeng Qu, Zhigang Chen, Qingsheng Liu, Guoping Hu

Figure 1 for SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary
Figure 2 for SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary
Figure 3 for SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary
Figure 4 for SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary

Sports game summarization aims to generate news articles from live text commentaries. A recent state-of-the-art work, SportsSum, not only constructs a large benchmark dataset, but also proposes a two-step framework. Despite its great contributions, the work has three main drawbacks: 1) the noise existed in SportsSum dataset degrades the summarization performance; 2) the neglect of lexical overlap between news and commentaries results in low-quality pseudo-labeling algorithm; 3) the usage of directly concatenating rewritten sentences to form news limits its practicability. In this paper, we publish a new benchmark dataset SportsSum2.0, together with a modified summarization framework. In particular, to obtain a clean dataset, we employ crowd workers to manually clean the original dataset. Moreover, the degree of lexical overlap is incorporated into the generation of pseudo labels. Further, we introduce a reranker-enhanced summarizer to take into account the fluency and expressiveness of the summarized news. Extensive experiments show that our model outperforms the state-of-the-art baseline.

* Accepted as a short paper in CIKM 2021 
Viaarxiv icon

Adversarial Training for Machine Reading Comprehension with Virtual Embeddings

Jun 08, 2021
Ziqing Yang, Yiming Cui, Chenglei Si, Wanxiang Che, Ting Liu, Shijin Wang, Guoping Hu

Adversarial training (AT) as a regularization method has proved its effectiveness on various tasks. Though there are successful applications of AT on some NLP tasks, the distinguishing characteristics of NLP tasks have not been exploited. In this paper, we aim to apply AT on machine reading comprehension (MRC) tasks. Furthermore, we adapt AT for MRC tasks by proposing a novel adversarial training method called PQAT that perturbs the embedding matrix instead of word vectors. To differentiate the roles of passages and questions, PQAT uses additional virtual P/Q-embedding matrices to gather the global perturbations of words from passages and questions separately. We test the method on a wide range of MRC tasks, including span-based extractive RC and multiple-choice RC. The results show that adversarial training is effective universally, and PQAT further improves the performance.

* Accepted to *SEM 2021 workshop at ACL 2021 
Viaarxiv icon

Memory Augmented Sequential Paragraph Retrieval for Multi-hop Question Answering

Feb 07, 2021
Nan Shao, Yiming Cui, Ting Liu, Shijin Wang, Guoping Hu

Retrieving information from correlative paragraphs or documents to answer open-domain multi-hop questions is very challenging. To deal with this challenge, most of the existing works consider paragraphs as nodes in a graph and propose graph-based methods to retrieve them. However, in this paper, we point out the intrinsic defect of such methods. Instead, we propose a new architecture that models paragraphs as sequential data and considers multi-hop information retrieval as a kind of sequence labeling task. Specifically, we design a rewritable external memory to model the dependency among paragraphs. Moreover, a threshold gate mechanism is proposed to eliminate the distraction of noise paragraphs. We evaluate our method on both full wiki and distractor subtask of HotpotQA, a public textual multi-hop QA dataset requiring multi-hop information retrieval. Experiments show that our method achieves significant improvement over the published state-of-the-art method in retrieval and downstream QA task performance.

* 10 pages 
Viaarxiv icon

Unsupervised Explanation Generation for Machine Reading Comprehension

Nov 13, 2020
Yiming Cui, Ting Liu, Shijin Wang, Guoping Hu

Figure 1 for Unsupervised Explanation Generation for Machine Reading Comprehension
Figure 2 for Unsupervised Explanation Generation for Machine Reading Comprehension
Figure 3 for Unsupervised Explanation Generation for Machine Reading Comprehension
Figure 4 for Unsupervised Explanation Generation for Machine Reading Comprehension

With the blooming of various Pre-trained Language Models (PLMs), Machine Reading Comprehension (MRC) has embraced significant improvements on various benchmarks and even surpass human performances. However, the existing works only target on the accuracy of the final predictions and neglect the importance of the explanations for the prediction, which is a big obstacle when utilizing these models in real-life applications to convince humans. In this paper, we propose a self-explainable framework for the machine reading comprehension task. The main idea is that the proposed system tries to use less passage information and achieve similar results compared to the system that uses the whole passage, while the filtered passage will be used as explanations. We carried out experiments on three multiple-choice MRC datasets, and found that the proposed system could achieve consistent improvements over baseline systems. To evaluate the explainability, we compared our approach with the traditional attention mechanism in human evaluations and found that the proposed system has a notable advantage over the latter one.

* 10 pages 
Viaarxiv icon

CharBERT: Character-aware Pre-trained Language Model

Nov 03, 2020
Wentao Ma, Yiming Cui, Chenglei Si, Ting Liu, Shijin Wang, Guoping Hu

Figure 1 for CharBERT: Character-aware Pre-trained Language Model
Figure 2 for CharBERT: Character-aware Pre-trained Language Model
Figure 3 for CharBERT: Character-aware Pre-trained Language Model
Figure 4 for CharBERT: Character-aware Pre-trained Language Model

Most pre-trained language models (PLMs) construct word representations at subword level with Byte-Pair Encoding (BPE) or its variations, by which OOV (out-of-vocab) words are almost avoidable. However, those methods split a word into subword units and make the representation incomplete and fragile. In this paper, we propose a character-aware pre-trained language model named CharBERT improving on the previous methods (such as BERT, RoBERTa) to tackle these problems. We first construct the contextual word embedding for each token from the sequential character representations, then fuse the representations of characters and the subword representations by a novel heterogeneous interaction module. We also propose a new pre-training task named NLM (Noisy LM) for unsupervised character representation learning. We evaluate our method on question answering, sequence labeling, and text classification tasks, both on the original datasets and adversarial misspelling test sets. The experimental results show that our method can significantly improve the performance and robustness of PLMs simultaneously. Pretrained models, evaluation sets, and code are available at https://github.com/wtma/CharBERT

* 12 pages, to appear at COLING 2020 
Viaarxiv icon