Alert button
Picture for Xinting Huang

Xinting Huang

Alert button

TeGit: Generating High-Quality Instruction-Tuning Data with Text-Grounded Task Design

Sep 11, 2023
Yongrui Chen, Haiyun Jiang, Xinting Huang, Shuming Shi, Guilin Qi

Figure 1 for TeGit: Generating High-Quality Instruction-Tuning Data with Text-Grounded Task Design
Figure 2 for TeGit: Generating High-Quality Instruction-Tuning Data with Text-Grounded Task Design
Figure 3 for TeGit: Generating High-Quality Instruction-Tuning Data with Text-Grounded Task Design
Figure 4 for TeGit: Generating High-Quality Instruction-Tuning Data with Text-Grounded Task Design

High-quality instruction-tuning data is critical to improving LLM capabilities. Existing data collection methods are limited by unrealistic manual labeling costs or by the hallucination of relying solely on LLM generation. To address the problems, this paper presents a scalable method to automatically collect high-quality instructional adaptation data by training language models to automatically design tasks based on human-written texts. Intuitively, human-written text helps to help the model attenuate illusions during the generation of tasks. Unlike instruction back-translation-based methods that directly take the given text as a response, we require the model to generate the \textit{instruction}, \textit{input}, and \textit{output} simultaneously to filter the noise. The results of the automated and manual evaluation experiments demonstrate the quality of our dataset.

* Work in progress 
Viaarxiv icon

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Sep 03, 2023
Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, Shuming Shi

Figure 1 for Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Figure 2 for Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Figure 3 for Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Figure 4 for Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge to the reliability of LLMs in real-world scenarios. In this paper, we survey recent efforts on the detection, explanation, and mitigation of hallucination, with an emphasis on the unique challenges posed by LLMs. We present taxonomies of the LLM hallucination phenomena and evaluation benchmarks, analyze existing approaches aiming at mitigating LLM hallucination, and discuss potential directions for future research.

* work in progress; 32 pages 
Viaarxiv icon

Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration

Jun 15, 2023
Chenyang Lyu, Minghao Wu, Longyue Wang, Xinting Huang, Bingshuai Liu, Zefeng Du, Shuming Shi, Zhaopeng Tu

Figure 1 for Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration
Figure 2 for Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration
Figure 3 for Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration
Figure 4 for Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration

Although instruction-tuned large language models (LLMs) have exhibited remarkable capabilities across various NLP tasks, their effectiveness on other data modalities beyond text has not been fully studied. In this work, we propose Macaw-LLM, a novel multi-modal LLM that seamlessly integrates visual, audio, and textual information. Macaw-LLM consists of three main components: a modality module for encoding multi-modal data, a cognitive module for harnessing pretrained LLMs, and an alignment module for harmonizing diverse representations. Our novel alignment module seamlessly bridges multi-modal features to textual features, simplifying the adaptation process from the modality modules to the cognitive module. In addition, we construct a large-scale multi-modal instruction dataset in terms of multi-turn dialogue, including 69K image instances and 50K video instances. We have made our data, code and model publicly available, which we hope can pave the way for future research in multi-modal LLMs and expand the capabilities of LLMs to handle diverse data modalities and address complex real-world scenarios.

* Longyue Wang is the corresponding author. Our project page is at https://github.com/lyuchenyang/Macaw-LLM 
Viaarxiv icon

Pre-training Multi-party Dialogue Models with Latent Discourse Inference

May 24, 2023
Yiyang Li, Xinting Huang, Wei Bi, Hai Zhao

Figure 1 for Pre-training Multi-party Dialogue Models with Latent Discourse Inference
Figure 2 for Pre-training Multi-party Dialogue Models with Latent Discourse Inference
Figure 3 for Pre-training Multi-party Dialogue Models with Latent Discourse Inference
Figure 4 for Pre-training Multi-party Dialogue Models with Latent Discourse Inference

Multi-party dialogues are more difficult for models to understand than one-to-one two-party dialogues, since they involve multiple interlocutors, resulting in interweaving reply-to relations and information flows. To step over these obstacles, an effective way is to pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying. However, due to the lack of explicitly annotated discourse labels in multi-party dialogue corpora, previous works fail to scale up the pre-training process by putting aside the unlabeled multi-party conversational data for nothing. To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model by unsupervised latent variable inference methods. Experiments on multiple downstream tasks show that our pre-trained model outperforms strong baselines by large margins and achieves state-of-the-art (SOTA) results, justifying the effectiveness of our method. The official implementation of this paper is available at https://github.com/EricLee8/MPD_EMVI.

* Accepted by ACL 2023 
Viaarxiv icon

Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance

May 22, 2023
Yue Zhang, Leyang Cui, Deng Cai, Xinting Huang, Tao Fang, Wei Bi

Figure 1 for Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance
Figure 2 for Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance
Figure 3 for Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance
Figure 4 for Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance

ChatGPT and GPT-4 have attracted substantial interest from both academic and industrial circles, owing to their remarkable few-shot (or even zero-shot) ability to handle various tasks. Recent work shows that, after being fine-tuned with a few sets of instruction-driven data, the recently proposed LLM, LLaMa, exhibits an impressive capability to address a broad range of tasks. However, the zero-shot performance of LLMs does not consistently outperform that of models fined-tuned for specific scenarios. To explore whether the capabilities of LLMs can be further enhanced for specific scenarios, we choose the writing-assistance scenario as the testbed, including seven writing tasks. We collect training data for these tasks, reframe them in an instruction-following format, and subsequently refine LLaMa via instruction tuning. Experimental results show that continually fine-tuning LLaMa on writing instruction data significantly improves its ability on writing tasks. We also conduct more experiments and analyses to offer insights for future work on effectively fine-tuning LLaMa for specific scenarios.

* Work in progress 
Viaarxiv icon

Effidit: Your AI Writing Assistant

Aug 04, 2022
Shuming Shi, Enbo Zhao, Duyu Tang, Yan Wang, Piji Li, Wei Bi, Haiyun Jiang, Guoping Huang, Leyang Cui, Xinting Huang, Cong Zhou, Yong Dai, Dongyang Ma

Figure 1 for Effidit: Your AI Writing Assistant
Figure 2 for Effidit: Your AI Writing Assistant
Figure 3 for Effidit: Your AI Writing Assistant
Figure 4 for Effidit: Your AI Writing Assistant

In this technical report, we introduce Effidit (Efficient and Intelligent Editing), a digital writing assistant that facilitates users to write higher-quality text more efficiently by using artificial intelligence (AI) technologies. Previous writing assistants typically provide the function of error checking (to detect and correct spelling and grammatical errors) and limited text-rewriting functionality. With the emergence of large-scale neural language models, some systems support automatically completing a sentence or a paragraph. In Effidit, we significantly expand the capacities of a writing assistant by providing functions in five categories: text completion, error checking, text polishing, keywords to sentences (K2S), and cloud input methods (cloud IME). In the text completion category, Effidit supports generation-based sentence completion, retrieval-based sentence completion, and phrase completion. In contrast, many other writing assistants so far only provide one or two of the three functions. For text polishing, we have three functions: (context-aware) phrase polishing, sentence paraphrasing, and sentence expansion, whereas many other writing assistants often support one or two functions in this category. The main contents of this report include major modules of Effidit, methods for implementing these modules, and evaluation results of some key methods.

* Technical report for Effidit. arXiv admin note: text overlap with arXiv:2202.06417 
Viaarxiv icon

Robust Task-Oriented Dialogue Generation with Contrastive Pre-training and Adversarial Filtering

May 20, 2022
Shiquan Yang, Xinting Huang, Jey Han Lau, Sarah Erfani

Figure 1 for Robust Task-Oriented Dialogue Generation with Contrastive Pre-training and Adversarial Filtering
Figure 2 for Robust Task-Oriented Dialogue Generation with Contrastive Pre-training and Adversarial Filtering
Figure 3 for Robust Task-Oriented Dialogue Generation with Contrastive Pre-training and Adversarial Filtering
Figure 4 for Robust Task-Oriented Dialogue Generation with Contrastive Pre-training and Adversarial Filtering

Data artifacts incentivize machine learning models to learn non-transferable generalizations by taking advantage of shortcuts in the data, and there is growing evidence that data artifacts play a role for the strong results that deep learning models achieve in recent natural language processing benchmarks. In this paper, we focus on task-oriented dialogue and investigate whether popular datasets such as MultiWOZ contain such data artifacts. We found that by only keeping frequent phrases in the training examples, state-of-the-art models perform similarly compared to the variant trained with full data, suggesting they exploit these spurious correlations to solve the task. Motivated by this, we propose a contrastive learning based framework to encourage the model to ignore these cues and focus on learning generalisable patterns. We also experiment with adversarial filtering to remove "easy" training instances so that the model would focus on learning from the "harder" instances. We conduct a number of generalization experiments -- e.g., cross-domain/dataset and adversarial tests -- to assess the robustness of our approach and found that it works exceptionally well.

Viaarxiv icon

Generalizable and Explainable Dialogue Generation via Explicit Action Learning

Oct 08, 2020
Xinting Huang, Jianzhong Qi, Yu Sun, Rui Zhang

Figure 1 for Generalizable and Explainable Dialogue Generation via Explicit Action Learning
Figure 2 for Generalizable and Explainable Dialogue Generation via Explicit Action Learning
Figure 3 for Generalizable and Explainable Dialogue Generation via Explicit Action Learning
Figure 4 for Generalizable and Explainable Dialogue Generation via Explicit Action Learning

Response generation for task-oriented dialogues implicitly optimizes two objectives at the same time: task completion and language quality. Conditioned response generation serves as an effective approach to separately and better optimize these two objectives. Such an approach relies on system action annotations which are expensive to obtain. To alleviate the need of action annotations, latent action learning is introduced to map each utterance to a latent representation. However, this approach is prone to over-dependence on the training data, and the generalization capability is thus restricted. To address this issue, we propose to learn natural language actions that represent utterances as a span of words. This explicit action representation promotes generalization via the compositional structure of language. It also enables an explainable generation process. Our proposed unsupervised approach learns a memory component to summarize system utterances into a short span of words. To further promote a compact action representation, we propose an auxiliary task that restores state annotations as the summarized dialogue context using the memory component. Our proposed approach outperforms latent action baselines on MultiWOZ, a benchmark multi-domain dataset.

* Accepted to Proceedings of EMNLP 2020 (Findings) 
Viaarxiv icon

KaLM at SemEval-2020 Task 4: Knowledge-aware Language Models for Comprehension And Generation

May 24, 2020
Jiajing Wan, Xinting Huang

Figure 1 for KaLM at SemEval-2020 Task 4: Knowledge-aware Language Models for Comprehension And Generation
Figure 2 for KaLM at SemEval-2020 Task 4: Knowledge-aware Language Models for Comprehension And Generation
Figure 3 for KaLM at SemEval-2020 Task 4: Knowledge-aware Language Models for Comprehension And Generation
Figure 4 for KaLM at SemEval-2020 Task 4: Knowledge-aware Language Models for Comprehension And Generation

This paper presents our strategies in SemEval 2020 Task 4: Commonsense Validation and Explanation. We propose a novel way to search for evidence and choose the different large-scale pre-trained models as the backbone for three subtasks. The results show that our evidence-searching approach improves model performance on commonsense explanation task. Our team ranks 2nd in subtask C according to human evaluation score.

* 6 pages, 1 figure 
Viaarxiv icon

Semi-Supervised Dialogue Policy Learning via Stochastic Reward Estimation

May 09, 2020
Xinting Huang, Jianzhong Qi, Yu Sun, Rui Zhang

Figure 1 for Semi-Supervised Dialogue Policy Learning via Stochastic Reward Estimation
Figure 2 for Semi-Supervised Dialogue Policy Learning via Stochastic Reward Estimation
Figure 3 for Semi-Supervised Dialogue Policy Learning via Stochastic Reward Estimation
Figure 4 for Semi-Supervised Dialogue Policy Learning via Stochastic Reward Estimation

Dialogue policy optimization often obtains feedback until task completion in task-oriented dialogue systems. This is insufficient for training intermediate dialogue turns since supervision signals (or rewards) are only provided at the end of dialogues. To address this issue, reward learning has been introduced to learn from state-action pairs of an optimal policy to provide turn-by-turn rewards. This approach requires complete state-action annotations of human-to-human dialogues (i.e., expert demonstrations), which is labor intensive. To overcome this limitation, we propose a novel reward learning approach for semi-supervised policy learning. The proposed approach learns a dynamics model as the reward function which models dialogue progress (i.e., state-action sequences) based on expert demonstrations, either with or without annotations. The dynamics model computes rewards by predicting whether the dialogue progress is consistent with expert demonstrations. We further propose to learn action embeddings for a better generalization of the reward function. The proposed approach outperforms competitive policy learning baselines on MultiWOZ, a benchmark multi-domain dataset.

Viaarxiv icon