Alert button
Picture for Zhifang Sui

Zhifang Sui

Alert button

Guiding AMR Parsing with Reverse Graph Linearization

Oct 13, 2023
Bofei Gao, Liang Chen, Peiyi Wang, Zhifang Sui, Baobao Chang

Figure 1 for Guiding AMR Parsing with Reverse Graph Linearization
Figure 2 for Guiding AMR Parsing with Reverse Graph Linearization
Figure 3 for Guiding AMR Parsing with Reverse Graph Linearization
Figure 4 for Guiding AMR Parsing with Reverse Graph Linearization

Abstract Meaning Representation (AMR) parsing aims to extract an abstract semantic graph from a given sentence. The sequence-to-sequence approaches, which linearize the semantic graph into a sequence of nodes and edges and generate the linearized graph directly, have achieved good performance. However, we observed that these approaches suffer from structure loss accumulation during the decoding process, leading to a much lower F1-score for nodes and edges decoded later compared to those decoded earlier. To address this issue, we propose a novel Reverse Graph Linearization (RGL) enhanced framework. RGL defines both default and reverse linearization orders of an AMR graph, where most structures at the back part of the default order appear at the front part of the reversed order and vice versa. RGL incorporates the reversed linearization to the original AMR parser through a two-pass self-distillation mechanism, which guides the model when generating the default linearizations. Our analysis shows that our proposed method significantly mitigates the problem of structure loss accumulation, outperforming the previously best AMR parsing model by 0.8 and 0.5 Smatch scores on the AMR 2.0 and AMR 3.0 dataset, respectively. The code are available at https://github.com/pkunlp-icler/AMR_reverse_graph_linearization.

* Findings of EMNLP2023 
Viaarxiv icon

Not All Demonstration Examples are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning

Oct 12, 2023
Zhe Yang, Damai Dai, Peiyi Wang, Zhifang Sui

Figure 1 for Not All Demonstration Examples are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning
Figure 2 for Not All Demonstration Examples are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning
Figure 3 for Not All Demonstration Examples are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning
Figure 4 for Not All Demonstration Examples are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning

Large Language Models (LLMs) have recently gained the In-Context Learning (ICL) ability with the models scaling up, allowing them to quickly adapt to downstream tasks with only a few demonstration examples prepended in the input sequence. Nonetheless, the current practice of ICL treats all demonstration examples equally, which still warrants improvement, as the quality of examples is usually uneven. In this paper, we investigate how to determine approximately optimal weights for demonstration examples and how to apply them during ICL. To assess the quality of weights in the absence of additional validation data, we design a masked self-prediction (MSP) score that exhibits a strong correlation with the final ICL performance. To expedite the weight-searching process, we discretize the continuous weight space and adopt beam search. With approximately optimal weights obtained, we further propose two strategies to apply them to demonstrations at different model positions. Experimental results on 8 text classification tasks show that our approach outperforms conventional ICL by a large margin. Our code are publicly available at https:github.com/Zhe-Young/WICL.

* Findings of EMNLP 2023 
Viaarxiv icon

InfoCL: Alleviating Catastrophic Forgetting in Continual Text Classification from An Information Theoretic Perspective

Oct 10, 2023
Yifan Song, Peiyi Wang, Weimin Xiong, Dawei Zhu, Tianyu Liu, Zhifang Sui, Sujian Li

Figure 1 for InfoCL: Alleviating Catastrophic Forgetting in Continual Text Classification from An Information Theoretic Perspective
Figure 2 for InfoCL: Alleviating Catastrophic Forgetting in Continual Text Classification from An Information Theoretic Perspective
Figure 3 for InfoCL: Alleviating Catastrophic Forgetting in Continual Text Classification from An Information Theoretic Perspective
Figure 4 for InfoCL: Alleviating Catastrophic Forgetting in Continual Text Classification from An Information Theoretic Perspective

Continual learning (CL) aims to constantly learn new knowledge over time while avoiding catastrophic forgetting on old tasks. We focus on continual text classification under the class-incremental setting. Recent CL studies have identified the severe performance decrease on analogous classes as a key factor for catastrophic forgetting. In this paper, through an in-depth exploration of the representation learning process in CL, we discover that the compression effect of the information bottleneck leads to confusion on analogous classes. To enable the model learn more sufficient representations, we propose a novel replay-based continual text classification method, InfoCL. Our approach utilizes fast-slow and current-past contrastive learning to perform mutual information maximization and better recover the previously learned representations. In addition, InfoCL incorporates an adversarial memory augmentation strategy to alleviate the overfitting problem of replay. Experimental results demonstrate that InfoCL effectively mitigates forgetting and achieves state-of-the-art performance on three text classification tasks. The code is publicly available at https://github.com/Yifan-Song793/InfoCL.

* Findings of EMNLP 2023. An improved version of arXiv:2305.07289 
Viaarxiv icon

Large Language Model for Science: A Study on P vs. NP

Sep 11, 2023
Qingxiu Dong, Li Dong, Ke Xu, Guangyan Zhou, Yaru Hao, Zhifang Sui, Furu Wei

Figure 1 for Large Language Model for Science: A Study on P vs. NP
Figure 2 for Large Language Model for Science: A Study on P vs. NP
Figure 3 for Large Language Model for Science: A Study on P vs. NP
Figure 4 for Large Language Model for Science: A Study on P vs. NP

In this work, we use large language models (LLMs) to augment and accelerate research on the P versus NP problem, one of the most important open problems in theoretical computer science and mathematics. Specifically, we propose Socratic reasoning, a general framework that promotes in-depth thinking with LLMs for complex problem-solving. Socratic reasoning encourages LLMs to recursively discover, solve, and integrate problems while facilitating self-evaluation and refinement. Our pilot study on the P vs. NP problem shows that GPT-4 successfully produces a proof schema and engages in rigorous reasoning throughout 97 dialogue turns, concluding "P $\neq$ NP", which is in alignment with (Xu and Zhou, 2023). The investigation uncovers novel insights within the extensive solution space of LLMs, shedding light on LLM for Science.

* 73 pages 
Viaarxiv icon

Making Large Language Models Better Reasoners with Alignment

Sep 05, 2023
Peiyi Wang, Lei Li, Liang Chen, Feifan Song, Binghuai Lin, Yunbo Cao, Tianyu Liu, Zhifang Sui

Figure 1 for Making Large Language Models Better Reasoners with Alignment
Figure 2 for Making Large Language Models Better Reasoners with Alignment
Figure 3 for Making Large Language Models Better Reasoners with Alignment
Figure 4 for Making Large Language Models Better Reasoners with Alignment

Reasoning is a cognitive process of using evidence to reach a sound conclusion. The reasoning capability is essential for large language models (LLMs) to serve as the brain of the artificial general intelligence agent. Recent studies reveal that fine-tuning LLMs on data with the chain of thought (COT) reasoning process can significantly enhance their reasoning capabilities. However, we find that the fine-tuned LLMs suffer from an \textit{Assessment Misalignment} problem, i.e., they frequently assign higher scores to subpar COTs, leading to potential limitations in their reasoning abilities. To address this problem, we introduce an \textit{Alignment Fine-Tuning (AFT)} paradigm, which involves three steps: 1) fine-tuning LLMs with COT training data; 2) generating multiple COT responses for each question, and categorizing them into positive and negative ones based on whether they achieve the correct answer; 3) calibrating the scores of positive and negative responses given by LLMs with a novel constraint alignment loss. Specifically, the constraint alignment loss has two objectives: a) Alignment, which guarantees that positive scores surpass negative scores to encourage answers with high-quality COTs; b) Constraint, which keeps the negative scores confined to a reasonable range to prevent the model degradation. Beyond just the binary positive and negative feedback, the constraint alignment loss can be seamlessly adapted to the ranking situations when ranking feedback is accessible. Furthermore, we also delve deeply into recent ranking-based alignment methods, such as DPO, RRHF, and PRO, and discover that the constraint, which has been overlooked by these approaches, is also crucial for their performance. Extensive experiments on four reasoning benchmarks with both binary and ranking feedback demonstrate the effectiveness of AFT.

* Large Language Models; Reasoning; Alignment 
Viaarxiv icon

Large Language Models are not Fair Evaluators

May 29, 2023
Peiyi Wang, Lei Li, Liang Chen, Dawei Zhu, Binghuai Lin, Yunbo Cao, Qi Liu, Tianyu Liu, Zhifang Sui

Figure 1 for Large Language Models are not Fair Evaluators
Figure 2 for Large Language Models are not Fair Evaluators
Figure 3 for Large Language Models are not Fair Evaluators
Figure 4 for Large Language Models are not Fair Evaluators

We uncover a systematic bias in the evaluation paradigm of adopting large language models~(LLMs), e.g., GPT-4, as a referee to score the quality of responses generated by candidate models. We find that the quality ranking of candidate responses can be easily hacked by simply altering their order of appearance in the context. This manipulation allows us to skew the evaluation result, making one model appear considerably superior to the other, e.g., vicuna could beat ChatGPT on 66 over 80 tested queries. To address this issue, we propose two simple yet effective calibration strategies: 1) Multiple Evidence Calibration, which requires the evaluator model to generate multiple detailed pieces of evidence before assigning ratings; 2) Balanced Position Calibration, which aggregates results across various orders to determine the final score. Extensive experiments demonstrate that our approach successfully mitigates evaluation bias, resulting in closer alignment with human judgments. To facilitate future research on more robust large language model comparison, we integrate the techniques in the paper into an easy-to-use toolkit \emph{FairEval}, along with the human annotations.\footnote{\url{https://github.com/i-Eval/FairEval}}

* work in progress 
Viaarxiv icon

Learn to Not Link: Exploring NIL Prediction in Entity Linking

May 25, 2023
Fangwei Zhu, Jifan Yu, Hailong Jin, Juanzi Li, Lei Hou, Zhifang Sui

Figure 1 for Learn to Not Link: Exploring NIL Prediction in Entity Linking
Figure 2 for Learn to Not Link: Exploring NIL Prediction in Entity Linking
Figure 3 for Learn to Not Link: Exploring NIL Prediction in Entity Linking
Figure 4 for Learn to Not Link: Exploring NIL Prediction in Entity Linking

Entity linking models have achieved significant success via utilizing pretrained language models to capture semantic features. However, the NIL prediction problem, which aims to identify mentions without a corresponding entity in the knowledge base, has received insufficient attention. We categorize mentions linking to NIL into Missing Entity and Non-Entity Phrase, and propose an entity linking dataset NEL that focuses on the NIL prediction problem. NEL takes ambiguous entities as seeds, collects relevant mention context in the Wikipedia corpus, and ensures the presence of mentions linking to NIL by human annotation and entity masking. We conduct a series of experiments with the widely used bi-encoder and cross-encoder entity linking models, results show that both types of NIL mentions in training data have a significant influence on the accuracy of NIL prediction. Our code and dataset can be accessed at https://github.com/solitaryzero/NIL_EL

* ACL Findings 2023 
Viaarxiv icon

Denoising Bottleneck with Mutual Information Maximization for Video Multimodal Fusion

May 25, 2023
Shaoxaing Wu, Damai Dai, Ziwei Qin, Tianyu Liu, Binghuai Lin, Yunbo Cao, Zhifang Sui

Figure 1 for Denoising Bottleneck with Mutual Information Maximization for Video Multimodal Fusion
Figure 2 for Denoising Bottleneck with Mutual Information Maximization for Video Multimodal Fusion
Figure 3 for Denoising Bottleneck with Mutual Information Maximization for Video Multimodal Fusion
Figure 4 for Denoising Bottleneck with Mutual Information Maximization for Video Multimodal Fusion

Video multimodal fusion aims to integrate multimodal signals in videos, such as visual, audio and text, to make a complementary prediction with multiple modalities contents. However, unlike other image-text multimodal tasks, video has longer multimodal sequences with more redundancy and noise in both visual and audio modalities. Prior denoising methods like forget gate are coarse in the granularity of noise filtering. They often suppress the redundant and noisy information at the risk of losing critical information. Therefore, we propose a denoising bottleneck fusion (DBF) model for fine-grained video multimodal fusion. On the one hand, we employ a bottleneck mechanism to filter out noise and redundancy with a restrained receptive field. On the other hand, we use a mutual information maximization module to regulate the filter-out module to preserve key information within different modalities. Our DBF model achieves significant improvement over current state-of-the-art baselines on multiple benchmarks covering multimodal sentiment analysis and multimodal summarization tasks. It proves that our model can effectively capture salient features from noisy and redundant video, audio, and text inputs. The code for this paper is publicly available at https://github.com/WSXRHFG/DBF.

* Accept at ACL2023 
Viaarxiv icon

ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories

May 24, 2023
Heming Xia, Qingxiu Dong, Lei Li, Jingjing Xu, Ziwei Qin, Zhifang Sui

Figure 1 for ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories
Figure 2 for ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories
Figure 3 for ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories
Figure 4 for ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories

Recently, Pretrained Language Models (PLMs) have been serving as general-purpose interfaces, posing a significant demand for comprehensive visual knowledge. However, it remains unclear how well current PLMs and their visually augmented counterparts (VaLMs) can master visual commonsense knowledge. To investigate this, we propose ImageNetVC, a fine-grained, human-annotated dataset specifically designed for zero-shot visual commonsense evaluation across 1,000 ImageNet categories. Utilizing ImageNetVC, we delve into the fundamental visual commonsense knowledge of both unimodal PLMs and VaLMs, uncovering the scaling law and the influence of the backbone model on VaLMs. Furthermore, we investigate the factors affecting the visual commonsense knowledge of large-scale models, providing insights into the development of language models enriched with visual commonsense knowledge. Our code and dataset are available at https://github.com/hemingkx/ImageNetVC.

Viaarxiv icon

Bi-Drop: Generalizable Fine-tuning for Pre-trained Language Models via Adaptive Subnetwork Optimization

May 24, 2023
Shoujie Tong, Heming Xia, Damai Dai, Tianyu Liu, Binghuai Lin, Yunbo Cao, Zhifang Sui

Figure 1 for Bi-Drop: Generalizable Fine-tuning for Pre-trained Language Models via Adaptive Subnetwork Optimization
Figure 2 for Bi-Drop: Generalizable Fine-tuning for Pre-trained Language Models via Adaptive Subnetwork Optimization
Figure 3 for Bi-Drop: Generalizable Fine-tuning for Pre-trained Language Models via Adaptive Subnetwork Optimization
Figure 4 for Bi-Drop: Generalizable Fine-tuning for Pre-trained Language Models via Adaptive Subnetwork Optimization

Pretrained language models have achieved remarkable success in a variety of natural language understanding tasks. Nevertheless, finetuning large pretrained models on downstream tasks is susceptible to overfitting if the training set is limited, which will lead to diminished performance. In this work, we propose a dynamic fine-tuning strategy for pretrained language models called Bi-Drop. It utilizes the gradient information of various sub-models generated by dropout to update the model parameters selectively. Experiments on the GLUE benchmark show that Bi-Drop outperforms previous fine-tuning methods by a considerable margin, and exhibits consistent superiority over vanilla fine-tuning across various pretrained models. Furthermore, empirical results indicate that Bi-Drop yields substantial improvements in the multiple task or domain transfer, data imbalance, and low-resource scenarios, demonstrating superb generalization ability and robustness.

* Work in progress. Co-first authors with equal contributions 
Viaarxiv icon