Alert button
Picture for Libo Qin

Libo Qin

Alert button

End-to-end Task-oriented Dialogue: A Survey of Tasks, Methods, and Future Directions

Nov 15, 2023
Libo Qin, Wenbo Pan, Qiguang Chen, Lizi Liao, Zhou Yu, Yue Zhang, Wanxiang Che, Min Li

End-to-end task-oriented dialogue (EToD) can directly generate responses in an end-to-end fashion without modular training, which attracts escalating popularity. The advancement of deep neural networks, especially the successful use of large pre-trained models, has further led to significant progress in EToD research in recent years. In this paper, we present a thorough review and provide a unified perspective to summarize existing approaches as well as recent trends to advance the development of EToD research. The contributions of this paper can be summarized: (1) \textbf{\textit{First survey}}: to our knowledge, we take the first step to present a thorough survey of this research field; (2) \textbf{\textit{New taxonomy}}: we first introduce a unified perspective for EToD, including (i) \textit{Modularly EToD} and (ii) \textit{Fully EToD}; (3) \textbf{\textit{New Frontiers}}: we discuss some potential frontier areas as well as the corresponding challenges, hoping to spur breakthrough research in EToD field; (4) \textbf{\textit{Abundant resources}}: we build a public website\footnote{We collect the related papers, baseline projects, and leaderboards for the community at \url{https://etods.net/}.}, where EToD researchers could directly access the recent progress. We hope this work can serve as a thorough reference for the EToD research community.

* Accepted at EMNLP2023 
Viaarxiv icon

Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning across Languages

Oct 23, 2023
Libo Qin, Qiguang Chen, Fuxuan Wei, Shijue Huang, Wanxiang Che

Chain-of-thought (CoT) is capable of eliciting models to explicitly generate reasoning paths, thus promoting reasoning accuracy and attracting increasing attention. Specifically, zero-shot CoT achieves remarkable improvements in a wide range of reasoning tasks by simply instructing the LLM with the prompt "Let's think step by step!". Despite the success of zero-shot CoT, the existing zero-shot prompting techniques remain limited to a single language, making it challenging to generalize to other languages and hindering global development. In this work, we introduce cross-lingual prompting (CLP), aiming to improve zero-shot CoT reasoning across languages. Specifically, CLP consists of two main components: (1) cross-lingual alignment prompting and (2) task-specific solver prompting. The cross-lingual alignment prompting is responsible for aligning representations across different languages, whereas the task-specific solver prompting is used to generate the final chain of thoughts and results for the reasoning task. In addition, we further introduce cross-lingual self-consistent prompting (CLSP) to ensemble different reasoning paths across languages. Our experimental evaluations on several benchmarks demonstrate that CLP and CLSP significantly outperform the existing prompting methods and achieve state-of-the-art performance. We hope this work will inspire further breakthroughs in cross-lingual CoT.

* Accepted at EMNLP2023 Main Conference 
Viaarxiv icon

Improving Few-shot and Zero-shot Entity Linking with Coarse-to-Fine Lexicon-based Retriever

Aug 13, 2023
Shijue Huang, Bingbing Wang, Libo Qin, Qin Zhao, Ruifeng Xu

Figure 1 for Improving Few-shot and Zero-shot Entity Linking with Coarse-to-Fine Lexicon-based Retriever
Figure 2 for Improving Few-shot and Zero-shot Entity Linking with Coarse-to-Fine Lexicon-based Retriever
Figure 3 for Improving Few-shot and Zero-shot Entity Linking with Coarse-to-Fine Lexicon-based Retriever
Figure 4 for Improving Few-shot and Zero-shot Entity Linking with Coarse-to-Fine Lexicon-based Retriever

Few-shot and zero-shot entity linking focus on the tail and emerging entities, which are more challenging but closer to real-world scenarios. The mainstream method is the ''retrieve and rerank'' two-stage framework. In this paper, we propose a coarse-to-fine lexicon-based retriever to retrieve entity candidates in an effective manner, which operates in two layers. The first layer retrieves coarse-grained candidates by leveraging entity names, while the second layer narrows down the search to fine-grained candidates within the coarse-grained ones. In addition, this second layer utilizes entity descriptions to effectively disambiguate tail or new entities that share names with existing popular entities. Experimental results indicate that our approach can obtain superior performance without requiring extensive finetuning in the retrieval stage. Notably, our approach ranks the 1st in NLPCC 2023 Shared Task 6 on Chinese Few-shot and Zero-shot Entity Linking.

* Accepted to NLPCC2023 
Viaarxiv icon

MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System

Jul 14, 2023
Libo Qin, Shijue Huang, Qiguang Chen, Chenran Cai, Yudi Zhang, Bin Liang, Wanxiang Che, Ruifeng Xu

Figure 1 for MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System
Figure 2 for MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System
Figure 3 for MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System
Figure 4 for MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System

Multi-modal sarcasm detection has attracted much recent attention. Nevertheless, the existing benchmark (MMSD) has some shortcomings that hinder the development of reliable multi-modal sarcasm detection system: (1) There are some spurious cues in MMSD, leading to the model bias learning; (2) The negative samples in MMSD are not always reasonable. To solve the aforementioned issues, we introduce MMSD2.0, a correction dataset that fixes the shortcomings of MMSD, by removing the spurious cues and re-annotating the unreasonable samples. Meanwhile, we present a novel framework called multi-view CLIP that is capable of leveraging multi-grained cues from multiple perspectives (i.e., text, image, and text-image interaction view) for multi-modal sarcasm detection. Extensive experiments show that MMSD2.0 is a valuable benchmark for building reliable multi-modal sarcasm detection systems and multi-view CLIP can significantly outperform the previous best baselines.

* Accepted by ACL2023 Findings 
Viaarxiv icon

OpenSLU: A Unified, Modularized, and Extensible Toolkit for Spoken Language Understanding

May 17, 2023
Libo Qin, Qiguang Chen, Xiao Xu, Yunlong Feng, Wanxiang Che

Figure 1 for OpenSLU: A Unified, Modularized, and Extensible Toolkit for Spoken Language Understanding
Figure 2 for OpenSLU: A Unified, Modularized, and Extensible Toolkit for Spoken Language Understanding
Figure 3 for OpenSLU: A Unified, Modularized, and Extensible Toolkit for Spoken Language Understanding
Figure 4 for OpenSLU: A Unified, Modularized, and Extensible Toolkit for Spoken Language Understanding

Spoken Language Understanding (SLU) is one of the core components of a task-oriented dialogue system, which aims to extract the semantic meaning of user queries (e.g., intents and slots). In this work, we introduce OpenSLU, an open-source toolkit to provide a unified, modularized, and extensible toolkit for spoken language understanding. Specifically, OpenSLU unifies 10 SLU models for both single-intent and multi-intent scenarios, which support both non-pretrained and pretrained models simultaneously. Additionally, OpenSLU is highly modularized and extensible by decomposing the model architecture, inference, and learning process into reusable modules, which allows researchers to quickly set up SLU experiments with highly flexible configurations. OpenSLU is implemented based on PyTorch, and released at \url{https://github.com/LightChen233/OpenSLU}.

* ACL2023 Demo Paper 
Viaarxiv icon

A Two-Stage Framework with Self-Supervised Distillation For Cross-Domain Text Classification

Apr 18, 2023
Yunlong Feng, Bohan Li, Libo Qin, Xiao Xu, Wanxiang Che

Figure 1 for A Two-Stage Framework with Self-Supervised Distillation For Cross-Domain Text Classification
Figure 2 for A Two-Stage Framework with Self-Supervised Distillation For Cross-Domain Text Classification
Figure 3 for A Two-Stage Framework with Self-Supervised Distillation For Cross-Domain Text Classification
Figure 4 for A Two-Stage Framework with Self-Supervised Distillation For Cross-Domain Text Classification

Cross-domain text classification aims to adapt models to a target domain that lacks labeled data. It leverages or reuses rich labeled data from the different but related source domain(s) and unlabeled data from the target domain. To this end, previous work focuses on either extracting domain-invariant features or task-agnostic features, ignoring domain-aware features that may be present in the target domain and could be useful for the downstream task. In this paper, we propose a two-stage framework for cross-domain text classification. In the first stage, we finetune the model with mask language modeling (MLM) and labeled data from the source domain. In the second stage, we further fine-tune the model with self-supervised distillation (SSD) and unlabeled data from the target domain. We evaluate its performance on a public cross-domain text classification benchmark and the experiment results show that our method achieves new state-of-the-art results for both single-source domain adaptations (94.17% $\uparrow$1.03%) and multi-source domain adaptations (95.09% $\uparrow$1.34%).

Viaarxiv icon

LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model

Apr 13, 2023
Hao Fei, Shengqiong Wu, Jingye Li, Bobo Li, Fei Li, Libo Qin, Meishan Zhang, Min Zhang, Tat-Seng Chua

Figure 1 for LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model
Figure 2 for LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model
Figure 3 for LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model
Figure 4 for LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model

Universally modeling all typical information extraction tasks (UIE) with one generative language model (GLM) has revealed great potential by the latest study, where various IE predictions are unified into a linearized hierarchical expression under a GLM. Syntactic structure information, a type of effective feature which has been extensively utilized in IE community, should also be beneficial to UIE. In this work, we propose a novel structure-aware GLM, fully unleashing the power of syntactic knowledge for UIE. A heterogeneous structure inductor is explored to unsupervisedly induce rich heterogeneous structural representations by post-training an existing GLM. In particular, a structural broadcaster is devised to compact various latent trees into explicit high-order forests, helping to guide a better generation during decoding. We finally introduce a task-oriented structure fine-tuning mechanism, further adjusting the learned structures to most coincide with the end-task's need. Over 12 IE benchmarks across 7 tasks our system shows significant improvements over the baseline UIE system. Further in-depth analyses show that our GLM learns rich task-adaptive structural bias that greatly resolves the UIE crux, the long-range dependence issue and boundary identifying. Source codes are open at https://github.com/ChocoWu/LasUIE.

* NeurIPS2022 conference paper 
Viaarxiv icon

A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding

Apr 09, 2023
Wenbo Pan, Qiguang Chen, Xiao Xu, Wanxiang Che, Libo Qin

Figure 1 for A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding
Figure 2 for A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding
Figure 3 for A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding
Figure 4 for A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding

Zero-shot dialogue understanding aims to enable dialogue to track the user's needs without any training data, which has gained increasing attention. In this work, we investigate the understanding ability of ChatGPT for zero-shot dialogue understanding tasks including spoken language understanding (SLU) and dialogue state tracking (DST). Experimental results on four popular benchmarks reveal the great potential of ChatGPT for zero-shot dialogue understanding. In addition, extensive analysis shows that ChatGPT benefits from the multi-turn interactive prompt in the DST task but struggles to perform slot filling for SLU. Finally, we summarize several unexpected behaviors of ChatGPT in dialogue understanding tasks, hoping to provide some insights for future research on building zero-shot dialogue understanding systems with Large Language Models (LLMs).

* Technical Report 
Viaarxiv icon

HIT-SCIR at MMNLU-22: Consistency Regularization for Multilingual Spoken Language Understanding

Jan 05, 2023
Bo Zheng, Zhouyang Li, Fuxuan Wei, Qiguang Chen, Libo Qin, Wanxiang Che

Figure 1 for HIT-SCIR at MMNLU-22: Consistency Regularization for Multilingual Spoken Language Understanding
Figure 2 for HIT-SCIR at MMNLU-22: Consistency Regularization for Multilingual Spoken Language Understanding
Figure 3 for HIT-SCIR at MMNLU-22: Consistency Regularization for Multilingual Spoken Language Understanding
Figure 4 for HIT-SCIR at MMNLU-22: Consistency Regularization for Multilingual Spoken Language Understanding

Multilingual spoken language understanding (SLU) consists of two sub-tasks, namely intent detection and slot filling. To improve the performance of these two sub-tasks, we propose to use consistency regularization based on a hybrid data augmentation strategy. The consistency regularization enforces the predicted distributions for an example and its semantically equivalent augmentation to be consistent. We conduct experiments on the MASSIVE dataset under both full-dataset and zero-shot settings. Experimental results demonstrate that our proposed method improves the performance on both intent detection and slot filling tasks. Our system\footnote{The code will be available at \url{https://github.com/bozheng-hit/MMNLU-22-HIT-SCIR}.} ranked 1st in the MMNLU-22 competition under the full-dataset setting.

* Accepted by EMNLP2022 MMNLU-22 Workshop. The winner of the MMNLU-22 Competition Full Dataset Task. Code is available at https://github.com/bozheng-hit/MMNLU-22-HIT-SCIR 
Viaarxiv icon