Alert button
Picture for Xinbei Ma

Xinbei Ma

Alert button

Multi-turn Dialogue Comprehension from a Topic-aware Perspective

Sep 18, 2023
Xinbei Ma, Yi Xu, Hai Zhao, Zhuosheng Zhang

Dialogue related Machine Reading Comprehension requires language models to effectively decouple and model multi-turn dialogue passages. As a dialogue development goes after the intentions of participants, its topic may not keep constant through the whole passage. Hence, it is non-trivial to detect and leverage the topic shift in dialogue modeling. Topic modeling, although has been widely studied in plain text, deserves far more utilization in dialogue reading comprehension. This paper proposes to model multi-turn dialogues from a topic-aware perspective. We start with a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way. Then we use these fragments as topic-aware language processing units in further dialogue comprehension. On one hand, the split segments indict specific topics rather than mixed intentions, thus showing convenient on in-domain topic detection and location. For this task, we design a clustering system with a self-training auto-encoder, and we build two constructed datasets for evaluation. On the other hand, the split segments are an appropriate element of multi-turn dialogue response selection. For this purpose, we further present a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements and matches response candidates with a dual cross-attention. Empirical studies on three public benchmarks show great improvements over baselines. Our work continues the previous studies on document topic, and brings the dialogue modeling to a novel topic-aware perspective with exhaustive experiments and analyses.

Viaarxiv icon

Query Rewriting for Retrieval-Augmented Large Language Models

May 23, 2023
Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, Nan Duan

Figure 1 for Query Rewriting for Retrieval-Augmented Large Language Models
Figure 2 for Query Rewriting for Retrieval-Augmented Large Language Models
Figure 3 for Query Rewriting for Retrieval-Augmented Large Language Models
Figure 4 for Query Rewriting for Retrieval-Augmented Large Language Models

Large Language Models (LLMs) play a powerful \textit{Reader} of the \textit{Retrieve-then-Read} pipeline, making great progress in knowledge-based open-domain tasks. This work introduces a new framework, \textit{Rewrite-Retrieve-Read} that improves the retrieval-augmented method from the perspective of the query rewriting. Prior studies mostly contribute to adapt the retriever or stimulate the reader. Different from them, our approach pay attention of the query adaptation. Because the original query can not be always optimal to retrieve for the LLM, especially in the real world.(1) We first prompt an LLM to rewrite the queries, then conduct retrieval-augmented reading. (2) We further apply a small language model as a trainable rewriter, which rewrite the search query to cater to the frozen retriever and the LLM reader. To fine-tune the rewriter, we first use a pseudo data to conduct supervised warm-up training. Then the \textit{Retrieve-then-Read} pipeline is modeled as a reinforcement learning context. The rewriter is further trained as a policy model by maximize the reward of the pipeline performance. Evaluation is performed on two downstream tasks, open-domain QA and multiple choice. Our framework is proved effective and scalable.

* working in progress 
Viaarxiv icon

PROM: A Phrase-level Copying Mechanism with Pre-training for Abstractive Summarization

May 11, 2023
Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, Nan Duan

Figure 1 for PROM: A Phrase-level Copying Mechanism with Pre-training for Abstractive Summarization
Figure 2 for PROM: A Phrase-level Copying Mechanism with Pre-training for Abstractive Summarization
Figure 3 for PROM: A Phrase-level Copying Mechanism with Pre-training for Abstractive Summarization
Figure 4 for PROM: A Phrase-level Copying Mechanism with Pre-training for Abstractive Summarization

Based on the remarkable achievements of pre-trained language models in abstractive summarization, the copying mechanism has proved helpful by improving the factuality, stability, and overall performance. This work proposes PROM, a new PhRase-level cOpying Mechanism that enhances attention on n-grams, which can be applied to zero-shot summarization with pre-training. PROM adds an indicator layer to explicitly pick up tokens in n-gram that can be copied from the source, and calculates an auxiliary loss for the copying prediction. Empirical studies show that PROM makes significant improvements in fine-tuning on benchmarks. In zero-shot setting, PROM is utilized in the self-supervised pre-training on raw corpora and provides new general baselines on a wide range of summarization datasets. Further analysis shows that PROM performs more reasonable copying and contributes to faithfulness.

Viaarxiv icon

Structural Modeling for Dialogue Disentanglement

Oct 15, 2021
Xinbei Ma, Zhuosheng Zhang, Hai Zhao

Figure 1 for Structural Modeling for Dialogue Disentanglement
Figure 2 for Structural Modeling for Dialogue Disentanglement
Figure 3 for Structural Modeling for Dialogue Disentanglement
Figure 4 for Structural Modeling for Dialogue Disentanglement

Tangled multi-party dialogue context leads to challenges for dialogue reading comprehension, where multiple dialogue threads flow simultaneously within the same dialogue history, thus increasing difficulties in understanding a dialogue history for both human and machine. Dialogue disentanglement aims to clarify conversation threads in a multi-party dialogue history, thus reducing the difficulty of comprehending the long disordered dialogue passage. Existing studies commonly focus on utterance encoding with carefully designed feature engineering-based methods but pay inadequate attention to dialogue structure. This work designs a novel model to disentangle multi-party history into threads, by taking dialogue structure features into account. Specifically, based on the fact that dialogues are constructed through successive participation of speakers and interactions between users of interest, we extract clues of speaker property and reference of users to model the structure of a long dialogue record. The novel method is evaluated on the Ubuntu IRC dataset and shows state-of-the-art experimental results in dialogue disentanglement.

Viaarxiv icon

Enhanced Speaker-aware Multi-party Multi-turn Dialogue Comprehension

Sep 09, 2021
Xinbei Ma, Zhuosheng Zhang, Hai Zhao

Figure 1 for Enhanced Speaker-aware Multi-party Multi-turn Dialogue Comprehension
Figure 2 for Enhanced Speaker-aware Multi-party Multi-turn Dialogue Comprehension
Figure 3 for Enhanced Speaker-aware Multi-party Multi-turn Dialogue Comprehension
Figure 4 for Enhanced Speaker-aware Multi-party Multi-turn Dialogue Comprehension

Multi-party multi-turn dialogue comprehension brings unprecedented challenges on handling the complicated scenarios from multiple speakers and criss-crossed discourse relationship among speaker-aware utterances. Most existing methods deal with dialogue contexts as plain texts and pay insufficient attention to the crucial speaker-aware clues. In this work, we propose an enhanced speaker-aware model with masking attention and heterogeneous graph networks to comprehensively capture discourse clues from both sides of speaker property and speaker-aware relationships. With such comprehensive speaker-aware modeling, experimental results show that our speaker-aware model helps achieves state-of-the-art performance on the benchmark dataset Molweni. Case analysis shows that our model enhances the connections between utterances and their own speakers and captures the speaker-aware discourse relations, which are critical for dialogue modeling.

Viaarxiv icon