Alert button
Picture for Guohong Fu

Guohong Fu

Alert button

OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch

Sep 19, 2023
Juntao Li, Zecheng Tang, Yuyang Ding, Pinzheng Wang, Pei Guo, Wangjie You, Dan Qiao, Wenliang Chen, Guohong Fu, Qiaoming Zhu, Guodong Zhou, Min Zhang

Large language models (LLMs) with billions of parameters have demonstrated outstanding performance on various natural language processing tasks. This report presents OpenBA, an open-sourced 15B bilingual asymmetric seq2seq model, to contribute an LLM variant to the Chinese-oriented open-source model community. We enhance OpenBA with effective and efficient techniques as well as adopt a three-stage training strategy to train the model from scratch. Our solution can also achieve very competitive performance with only 380B tokens, which is better than LLaMA-70B on the BELEBELE benchmark, BLOOM-176B on the MMLU benchmark, GLM-130B on the C-Eval (hard) benchmark. This report provides the main details to pre-train an analogous model, including pre-training data processing, Bilingual Flan data collection, the empirical observations that inspire our model architecture design, training objectives of different stages, and other enhancement techniques. We have refactored our code to follow the design principles of the Huggingface Transformers Library, making it more convenient for developers to use, and released checkpoints of different training stages at https://huggingface.co/openBA. More details of our project are available at https://github.com/OpenNLG/openBA.git.

Viaarxiv icon

RSpell: Retrieval-augmented Framework for Domain Adaptive Chinese Spelling Check

Aug 16, 2023
Siqi Song, Qi Lv, Lei Geng, Ziqiang Cao, Guohong Fu

Chinese Spelling Check (CSC) refers to the detection and correction of spelling errors in Chinese texts. In practical application scenarios, it is important to make CSC models have the ability to correct errors across different domains. In this paper, we propose a retrieval-augmented spelling check framework called RSpell, which searches corresponding domain terms and incorporates them into CSC models. Specifically, we employ pinyin fuzzy matching to search for terms, which are combined with the input and fed into the CSC model. Then, we introduce an adaptive process control mechanism to dynamically adjust the impact of external knowledge on the model. Additionally, we develop an iterative strategy for the RSpell framework to enhance reasoning capabilities. We conducted experiments on CSC datasets in three domains: law, medicine, and official document writing. The results demonstrate that RSpell achieves state-of-the-art performance in both zero-shot and fine-tuning scenarios, demonstrating the effectiveness of the retrieval-augmented CSC framework. Our code is available at https://github.com/47777777/Rspell.

* NLPCC2023 oral  
Viaarxiv icon

Discourse-Aware Emotion Cause Extraction in Conversations

Oct 26, 2022
Dexin Kong, Nan Yu, Yun Yuan, Guohong Fu, Chen Gong

Figure 1 for Discourse-Aware Emotion Cause Extraction in Conversations
Figure 2 for Discourse-Aware Emotion Cause Extraction in Conversations
Figure 3 for Discourse-Aware Emotion Cause Extraction in Conversations
Figure 4 for Discourse-Aware Emotion Cause Extraction in Conversations

Emotion Cause Extraction in Conversations (ECEC) aims to extract the utterances which contain the emotional cause in conversations. Most prior research focuses on modelling conversational contexts with sequential encoding, ignoring the informative interactions between utterances and conversational-specific features for ECEC. In this paper, we investigate the importance of discourse structures in handling utterance interactions and conversationspecific features for ECEC. To this end, we propose a discourse-aware model (DAM) for this task. Concretely, we jointly model ECEC with discourse parsing using a multi-task learning (MTL) framework and explicitly encode discourse structures via gated graph neural network (gated GNN), integrating rich utterance interaction information to our model. In addition, we use gated GNN to further enhance our ECEC model with conversation-specific features. Results on the benchmark corpus show that DAM outperform the state-of-theart (SOTA) systems in the literature. This suggests that the discourse structure may contain a potential link between emotional utterances and their corresponding cause expressions. It also verifies the effectiveness of conversationalspecific features. The codes of this paper will be available on GitHub.

Viaarxiv icon

Visual Subtitle Feature Enhanced Video Outline Generation

Sep 01, 2022
Qi Lv, Ziqiang Cao, Wenrui Xie, Derui Wang, Jingwen Wang, Zhiwei Hu, Tangkun Zhang, Ba Yuan, Yuanhang Li, Min Cao, Wenjie Li, Sujian Li, Guohong Fu

Figure 1 for Visual Subtitle Feature Enhanced Video Outline Generation
Figure 2 for Visual Subtitle Feature Enhanced Video Outline Generation
Figure 3 for Visual Subtitle Feature Enhanced Video Outline Generation
Figure 4 for Visual Subtitle Feature Enhanced Video Outline Generation

With the tremendously increasing number of videos, there is a great demand for techniques that help people quickly navigate to the video segments they are interested in. However, current works on video understanding mainly focus on video content summarization, while little effort has been made to explore the structure of a video. Inspired by textual outline generation, we introduce a novel video understanding task, namely video outline generation (VOG). This task is defined to contain two sub-tasks: (1) first segmenting the video according to the content structure and then (2) generating a heading for each segment. To learn and evaluate VOG, we annotate a 10k+ dataset, called DuVOG. Specifically, we use OCR tools to recognize subtitles of videos. Then annotators are asked to divide subtitles into chapters and title each chapter. In videos, highlighted text tends to be the headline since it is more likely to attract attention. Therefore we propose a Visual Subtitle feature Enhanced video outline generation model (VSENet) which takes as input the textual subtitles together with their visual font sizes and positions. We consider the VOG task as a sequence tagging problem that extracts spans where the headings are located and then rewrites them to form the final outlines. Furthermore, based on the similarity between video outlines and textual outlines, we use a large number of articles with chapter headings to pretrain our model. Experiments on DuVOG show that our model largely outperforms other baseline methods, achieving 77.1 of F1-score for the video segmentation level and 85.0 of ROUGE-L_F0.5 for the headline generation level.

Viaarxiv icon

Revising Image-Text Retrieval via Multi-Modal Entailment

Sep 01, 2022
Xu Yan, Chunhui Ai, Ziqiang Cao, Min Cao, Sujian Li, Wenjie Li, Guohong Fu

Figure 1 for Revising Image-Text Retrieval via Multi-Modal Entailment
Figure 2 for Revising Image-Text Retrieval via Multi-Modal Entailment
Figure 3 for Revising Image-Text Retrieval via Multi-Modal Entailment
Figure 4 for Revising Image-Text Retrieval via Multi-Modal Entailment

An outstanding image-text retrieval model depends on high-quality labeled data. While the builders of existing image-text retrieval datasets strive to ensure that the caption matches the linked image, they cannot prevent a caption from fitting other images. We observe that such a many-to-many matching phenomenon is quite common in the widely-used retrieval datasets, where one caption can describe up to 178 images. These large matching-lost data not only confuse the model in training but also weaken the evaluation accuracy. Inspired by visual and textual entailment tasks, we propose a multi-modal entailment classifier to determine whether a sentence is entailed by an image plus its linked captions. Subsequently, we revise the image-text retrieval datasets by adding these entailed captions as additional weak labels of an image and develop a universal variable learning rate strategy to teach a retrieval model to distinguish the entailed captions from other negative samples. In experiments, we manually annotate an entailment-corrected image-text retrieval dataset for evaluation. The results demonstrate that the proposed entailment classifier achieves about 78% accuracy and consistently improves the performance of image-text retrieval baselines.

* 10 pages 
Viaarxiv icon

General and Domain Adaptive Chinese Spelling Check with Error Consistent Pretraining

Mar 21, 2022
Qi Lv, Ziqiang Cao, Lei Geng, Chunhui Ai, Xu Yan, Guohong Fu

Figure 1 for General and Domain Adaptive Chinese Spelling Check with Error Consistent Pretraining
Figure 2 for General and Domain Adaptive Chinese Spelling Check with Error Consistent Pretraining
Figure 3 for General and Domain Adaptive Chinese Spelling Check with Error Consistent Pretraining
Figure 4 for General and Domain Adaptive Chinese Spelling Check with Error Consistent Pretraining

The lack of label data is one of the significant bottlenecks for Chinese Spelling Check (CSC). Existing researches use the method of automatic generation by exploiting unlabeled data to expand the supervised corpus. However, there is a big gap between the real input scenario and automatic generated corpus. Thus, we develop a competitive general speller ECSpell which adopts the Error Consistent masking strategy to create data for pretraining. This error consistency masking strategy is used to specify the error types of automatically generated sentences which is consistent with real scene. The experimental result indicates our model outperforms previous state-of-the-art models on the general benchmark. Moreover, spellers often work within a particular domain in real life. Due to lots of uncommon domain terms, experiments on our built domain specific datasets show that general models perform terribly. Inspired by the common practice of input methods, we propose to add an alterable user dictionary to handle the zero-shot domain adaption problem. Specifically, we attach a User Dictionary guided inference module (UD) to a general token classification based speller. Our experiments demonstrate that ECSpell$^{UD}$, namely ECSpell combined with UD, surpasses all the other baselines largely, even approaching the performance on the general benchmark.

Viaarxiv icon

Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments

Oct 13, 2021
Yu Zhang, Qingrong Xia, Shilin Zhou, Yong Jiang, Zhenghua Li, Guohong Fu, Min Zhang

Figure 1 for Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments
Figure 2 for Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments
Figure 3 for Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments
Figure 4 for Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments

Semantic role labeling is a fundamental yet challenging task in the NLP community. Recent works of SRL mainly fall into two lines:1) BIO-based and 2) span-based. Despite effectiveness, they share some intrinsic drawbacks of not explicitly considering internal argument structures, which may potentially hinder the model's expressiveness. To remedy this, we propose to reduce SRL to a dependency parsing task and regard the flat argument spans as latent subtrees. In particular, we equip our formulation with a novel span-constrained TreeCRF model to make tree structures span-aware, and further extend it to the second-order case. Experiments on CoNLL05 and CoNLL12 benchmarks reveal that the results of our methods outperform all previous works and achieve the state-of-the-art.

Viaarxiv icon

Unseen Target Stance Detection with Adversarial Domain Generalization

Oct 12, 2020
Zhen Wang, Qiansheng Wang, Chengguo Lv, Xue Cao, Guohong Fu

Figure 1 for Unseen Target Stance Detection with Adversarial Domain Generalization
Figure 2 for Unseen Target Stance Detection with Adversarial Domain Generalization
Figure 3 for Unseen Target Stance Detection with Adversarial Domain Generalization

Although stance detection has made great progress in the past few years, it is still facing the problem of unseen targets. In this study, we investigate the domain difference between targets and thus incorporate attention-based conditional encoding with adversarial domain generalization to perform unseen target stance detection. Experimental results show that our approach achieves new state-of-the-art performance on the SemEval-2016 dataset, demonstrating the importance of domain difference between targets in unseen target stance detection.

Viaarxiv icon