Alert button
Picture for Haibin Chen

Haibin Chen

Alert button

Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference

Jun 19, 2023
Junhao Zheng, Qianli Ma, Shengjie Qiu, Yue Wu, Peitian Ma, Junlong Liu, Huawen Feng, Xichen Shang, Haibin Chen

Figure 1 for Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference
Figure 2 for Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference
Figure 3 for Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference
Figure 4 for Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference

Fine-tuning has been proven to be a simple and effective technique to transfer the learned knowledge of Pre-trained Language Models (PLMs) to downstream tasks. However, vanilla fine-tuning easily overfits the target data and degrades the generalization ability. Most existing studies attribute it to catastrophic forgetting, and they retain the pre-trained knowledge indiscriminately without identifying what knowledge is transferable. Motivated by this, we frame fine-tuning into a causal graph and discover that the crux of catastrophic forgetting lies in the missing causal effects from the pretrained data. Based on the causal view, we propose a unified objective for fine-tuning to retrieve the causality back. Intriguingly, the unified objective can be seen as the sum of the vanilla fine-tuning objective, which learns new knowledge from target data, and the causal objective, which preserves old knowledge from PLMs. Therefore, our method is flexible and can mitigate negative transfer while preserving knowledge. Since endowing models with commonsense is a long-standing challenge, we implement our method on commonsense QA with a proposed heuristic estimation to verify its effectiveness. In the experiments, our method outperforms state-of-the-art fine-tuning methods on all six commonsense QA datasets and can be implemented as a plug-in module to inflate the performance of existing QA models.

* ACL 2023 (oral paper) 
Viaarxiv icon

Distilling Causal Effect from Miscellaneous Other-Class for Continual Named Entity Recognition

Oct 08, 2022
Junhao Zheng, Zhanxian Liang, Haibin Chen, Qianli Ma

Figure 1 for Distilling Causal Effect from Miscellaneous Other-Class for Continual Named Entity Recognition
Figure 2 for Distilling Causal Effect from Miscellaneous Other-Class for Continual Named Entity Recognition
Figure 3 for Distilling Causal Effect from Miscellaneous Other-Class for Continual Named Entity Recognition
Figure 4 for Distilling Causal Effect from Miscellaneous Other-Class for Continual Named Entity Recognition

Continual Learning for Named Entity Recognition (CL-NER) aims to learn a growing number of entity types over time from a stream of data. However, simply learning Other-Class in the same way as new entity types amplifies the catastrophic forgetting and leads to a substantial performance drop. The main cause behind this is that Other-Class samples usually contain old entity types, and the old knowledge in these Other-Class samples is not preserved properly. Thanks to the causal inference, we identify that the forgetting is caused by the missing causal effect from the old data. To this end, we propose a unified causal framework to retrieve the causality from both new entity types and Other-Class. Furthermore, we apply curriculum learning to mitigate the impact of label noise and introduce a self-adaptive weight for balancing the causal effects between new entity types and Other-Class. Experimental results on three benchmark datasets show that our method outperforms the state-of-the-art method by a large margin. Moreover, our method can be combined with the existing state-of-the-art methods to improve the performance in CL-NER

* Accepted by EMNLP2022 
Viaarxiv icon