Alert button
Picture for Muhao Chen

Muhao Chen

Alert button

A Causal View of Entity Bias in (Large) Language Models

May 24, 2023
Fei Wang, Wenjie Mo, Yiwei Wang, Wenxuan Zhou, Muhao Chen

Figure 1 for A Causal View of Entity Bias in (Large) Language Models
Figure 2 for A Causal View of Entity Bias in (Large) Language Models
Figure 3 for A Causal View of Entity Bias in (Large) Language Models
Figure 4 for A Causal View of Entity Bias in (Large) Language Models

Entity bias widely affects pretrained (large) language models, causing them to excessively rely on (biased) parametric knowledge to make unfaithful predictions. Although causality-inspired methods have shown great potential to mitigate entity bias, it is hard to precisely estimate the parameters of underlying causal models in practice. The rise of black-box LLMs also makes the situation even worse, because of their inaccessible parameters and uncalibrated logits. To address these problems, we propose a specific structured causal model (SCM) whose parameters are comparatively easier to estimate. Building upon this SCM, we propose causal intervention techniques to mitigate entity bias for both white-box and black-box settings. The proposed causal intervention perturbs the original entity with neighboring entities. This intervention reduces specific biasing information pertaining to the original entity while still preserving sufficient common predictive information from similar entities. When evaluated on the relation extraction task, our training-time intervention significantly improves the F1 score of RoBERTa by 5.7 points on EntRED, in which spurious shortcuts between entities and labels are removed. Meanwhile, our in-context intervention effectively reduces the knowledge conflicts between parametric knowledge and contextual knowledge in GPT-3.5 and improves the F1 score by 9.14 points on a challenging test set derived from Re-TACRED.

* Work in progress 
Viaarxiv icon

Bridging Continuous and Discrete Spaces: Interpretable Sentence Representation Learning via Compositional Operations

May 24, 2023
James Y. Huang, Wenlin Yao, Kaiqiang Song, Hongming Zhang, Muhao Chen, Dong Yu

Figure 1 for Bridging Continuous and Discrete Spaces: Interpretable Sentence Representation Learning via Compositional Operations
Figure 2 for Bridging Continuous and Discrete Spaces: Interpretable Sentence Representation Learning via Compositional Operations
Figure 3 for Bridging Continuous and Discrete Spaces: Interpretable Sentence Representation Learning via Compositional Operations
Figure 4 for Bridging Continuous and Discrete Spaces: Interpretable Sentence Representation Learning via Compositional Operations

Traditional sentence embedding models encode sentences into vector representations to capture useful properties such as the semantic similarity between sentences. However, in addition to similarity, sentence semantics can also be interpreted via compositional operations such as sentence fusion or difference. It is unclear whether the compositional semantics of sentences can be directly reflected as compositional operations in the embedding space. To more effectively bridge the continuous embedding and discrete text spaces, we explore the plausibility of incorporating various compositional properties into the sentence embedding space that allows us to interpret embedding transformations as compositional sentence operations. We propose InterSent, an end-to-end framework for learning interpretable sentence embeddings that supports compositional sentence operations in the embedding space. Our method optimizes operator networks and a bottleneck encoder-decoder model to produce meaningful and interpretable sentence embeddings. Experimental results demonstrate that our method significantly improves the interpretability of sentence embeddings on four textual generation tasks over existing approaches while maintaining strong performance on traditional semantic similarity tasks.

Viaarxiv icon

EntRED: Benchmarking Relation Extraction with Fewer Shortcuts

May 22, 2023
Yiwei Wang, Bryan Hooi, Fei Wang, Yujun Cai, Yuxuan Liang, Wenxuan Zhou, Jing Tang, Manjuan Duan, Muhao Chen

Figure 1 for EntRED: Benchmarking Relation Extraction with Fewer Shortcuts
Figure 2 for EntRED: Benchmarking Relation Extraction with Fewer Shortcuts
Figure 3 for EntRED: Benchmarking Relation Extraction with Fewer Shortcuts
Figure 4 for EntRED: Benchmarking Relation Extraction with Fewer Shortcuts

Entity names play an effective role in relation extraction (RE) and often influence model performance. As a result, the entity names in the benchmarks' test sets significantly influence the evaluation of RE models. In this work, we find that the standard RE benchmarks' datasets have a large portion of incorrect entity annotations, low entity name diversity, and are prone to have shortcuts from entity names to ground-truth relations. These issues make the standard benchmarks far from reflecting the real-world scenarios. Hence, in this work, we present EntRED, a challenging RE benchmark with reduced shortcuts and higher diversity of entities. To build EntRED, we propose an end-to-end entity replacement pipeline based on causal inference (CI): ERIC. ERIC performs type-constrained replacements on entities to reduce the shortcuts from entity bias to ground-truth relations. ERIC applies CI in two aspects: 1) targeting the instances that need entity replacements, and 2) determining the candidate entities for replacements. We apply ERIC on TACRED to produce EntRED. Our EntRED evaluates whether the RE model can correctly extract the relations from the text instead of relying on entity bias. Empirical results reveal that even the strong RE model has a significant performance drop on EntRED, which memorizes entity name patterns instead of reasoning from the textual context. We release ERIC's source code and the EntRED benchmark at https://github.com/wangywUST/ENTRED.

* arXiv admin note: text overlap with arXiv:2109.05620 by other authors 
Viaarxiv icon

Take a Break in the Middle: Investigating Subgoals towards Hierarchical Script Generation

May 18, 2023
Xinze Li, Yixin Cao, Muhao Chen, Aixin Sun

Figure 1 for Take a Break in the Middle: Investigating Subgoals towards Hierarchical Script Generation
Figure 2 for Take a Break in the Middle: Investigating Subgoals towards Hierarchical Script Generation
Figure 3 for Take a Break in the Middle: Investigating Subgoals towards Hierarchical Script Generation
Figure 4 for Take a Break in the Middle: Investigating Subgoals towards Hierarchical Script Generation

Goal-oriented Script Generation is a new task of generating a list of steps that can fulfill the given goal. In this paper, we propose to extend the task from the perspective of cognitive theory. Instead of a simple flat structure, the steps are typically organized hierarchically - Human often decompose a complex task into subgoals, where each subgoal can be further decomposed into steps. To establish the benchmark, we contribute a new dataset, propose several baseline methods, and set up evaluation metrics. Both automatic and human evaluation verify the high-quality of dataset, as well as the effectiveness of incorporating subgoals into hierarchical script generation. Furthermore, We also design and evaluate the model to discover subgoal, and find that it is a bit more difficult to decompose the goals than summarizing from segmented steps.

* Accepted by ACL 2023 Findings 
Viaarxiv icon

Context-faithful Prompting for Large Language Models

Mar 20, 2023
Wenxuan Zhou, Sheng Zhang, Hoifung Poon, Muhao Chen

Figure 1 for Context-faithful Prompting for Large Language Models
Figure 2 for Context-faithful Prompting for Large Language Models
Figure 3 for Context-faithful Prompting for Large Language Models
Figure 4 for Context-faithful Prompting for Large Language Models

Large language models (LLMs) encode parametric knowledge about world facts and have shown remarkable performance in knowledge-driven NLP tasks. However, their reliance on parametric knowledge may cause them to overlook contextual cues, leading to incorrect predictions in context-sensitive NLP tasks (e.g., knowledge acquisition tasks). In this paper, we seek to assess and enhance LLMs' contextual faithfulness in two aspects: knowledge conflict and prediction with abstention. We demonstrate that LLMs' faithfulness can be significantly improved using carefully designed prompting strategies. In particular, we identify opinion-based prompts and counterfactual demonstrations as the most effective methods. Opinion-based prompts reframe the context as a narrator's statement and inquire about the narrator's opinions, while counterfactual demonstrations use instances containing false facts to improve faithfulness in knowledge conflict situations. Neither technique requires additional training. We conduct experiments on three datasets of two standard NLP tasks, machine reading comprehension and relation extraction, and the results demonstrate significant improvement in faithfulness to contexts.

* Code and data will be released at https://github.com/wzhouad/context-faithful-llm 
Viaarxiv icon

Continual Contrastive Finetuning Improves Low-Resource Relation Extraction

Dec 21, 2022
Wenxuan Zhou, Sheng Zhang, Tristan Naumann, Muhao Chen, Hoifung Poon

Figure 1 for Continual Contrastive Finetuning Improves Low-Resource Relation Extraction
Figure 2 for Continual Contrastive Finetuning Improves Low-Resource Relation Extraction
Figure 3 for Continual Contrastive Finetuning Improves Low-Resource Relation Extraction
Figure 4 for Continual Contrastive Finetuning Improves Low-Resource Relation Extraction

Relation extraction (RE), which has relied on structurally annotated corpora for model training, has been particularly challenging in low-resource scenarios and domains. Recent literature has tackled low-resource RE by self-supervised learning, where the solution involves pretraining the relation embedding by RE-based objective and finetuning on labeled data by classification-based objective. However, a critical challenge to this approach is the gap in objectives, which prevents the RE model from fully utilizing the knowledge in pretrained representations. In this paper, we aim at bridging the gap and propose to pretrain and finetune the RE model using consistent objectives of contrastive learning. Since in this kind of representation learning paradigm, one relation may easily form multiple clusters in the representation space, we further propose a multi-center contrastive loss that allows one relation to form multiple clusters to better align with pretraining. Experiments on two document-level RE datasets, BioRED and Re-DocRED, demonstrate the effectiveness of our method. Particularly, when using 1% end-task training data, our method outperforms PLM-based RE classifier by 10.5% and 5.8% on the two datasets, respectively.

Viaarxiv icon

Multi-hop Evidence Retrieval for Cross-document Relation Extraction

Dec 21, 2022
Keming Lu, I-Hung Hsu, Wenxuan Zhou, Mingyu Derek Ma, Muhao Chen

Figure 1 for Multi-hop Evidence Retrieval for Cross-document Relation Extraction
Figure 2 for Multi-hop Evidence Retrieval for Cross-document Relation Extraction
Figure 3 for Multi-hop Evidence Retrieval for Cross-document Relation Extraction
Figure 4 for Multi-hop Evidence Retrieval for Cross-document Relation Extraction

Relation Extraction (RE) has been extended to cross-document scenarios because many relations are not simply described in a single document. This inevitably brings the challenge of efficient open-space evidence retrieval to support the inference of cross-document relations, along with the challenge of multi-hop reasoning on top of entities and evidence scattered in an open set of documents. To combat these challenges, we propose Mr.CoD, a multi-hop evidence retrieval method based on evidence path mining and ranking with adapted dense retrievers. We explore multiple variants of retrievers to show evidence retrieval is an essential part in cross-document RE. Experiments on CodRED show that evidence retrieval with Mr.Cod effectively acquires cross-document evidence that essentially supports open-setting cross-document RE. Additionally, we show that Mr.CoD facilitates evidence retrieval and boosts end-to-end RE performance with effective multi-hop reasoning in both closed and open settings of RE.

* Work in progress 
Viaarxiv icon

Can NLI Provide Proper Indirect Supervision for Low-resource Biomedical Relation Extraction?

Dec 21, 2022
Jiashu Xu, Mingyu Derek Ma, Muhao Chen

Figure 1 for Can NLI Provide Proper Indirect Supervision for Low-resource Biomedical Relation Extraction?
Figure 2 for Can NLI Provide Proper Indirect Supervision for Low-resource Biomedical Relation Extraction?
Figure 3 for Can NLI Provide Proper Indirect Supervision for Low-resource Biomedical Relation Extraction?
Figure 4 for Can NLI Provide Proper Indirect Supervision for Low-resource Biomedical Relation Extraction?

Two key obstacles in biomedical relation extraction (RE) are the scarcity of annotations and the prevalence of instances without explicitly pre-defined labels due to low annotation coverage. Existing approaches, which treat biomedical RE as a multi-class classification task, often result in poor generalization in low-resource settings and do not have the ability to make selective prediction on unknown cases but give a guess from seen relations, hindering the applicability of those approaches. We present NBR, which converts biomedical RE as natural language inference formulation through indirect supervision. By converting relations to natural language hypotheses, NBR is capable of exploiting semantic cues to alleviate annotation scarcity. By incorporating a ranking-based loss that implicitly calibrates abstinent instances, NBR learns a clearer decision boundary and is instructed to abstain on uncertain instances. Extensive experiments on three widely-used biomedical RE benchmarks, namely ChemProt, DDI and GAD, verify the effectiveness of NBR in both full-set and low-resource regimes. Our analysis demonstrates that indirect supervision benefits biomedical RE even when a domain gap exists, and combining NLI knowledge with biomedical knowledge leads to the best performance gains.

* 16 pages 
Viaarxiv icon

On-the-fly Denoising for Data Augmentation in Natural Language Understanding

Dec 20, 2022
Tianqing Fang, Wenxuan Zhou, Fangyu Liu, Hongming Zhang, Yangqiu Song, Muhao Chen

Figure 1 for On-the-fly Denoising for Data Augmentation in Natural Language Understanding
Figure 2 for On-the-fly Denoising for Data Augmentation in Natural Language Understanding
Figure 3 for On-the-fly Denoising for Data Augmentation in Natural Language Understanding
Figure 4 for On-the-fly Denoising for Data Augmentation in Natural Language Understanding

Data Augmentation (DA) is frequently used to automatically provide additional training data without extra human annotation. However, data augmentation may introduce noisy data that impairs training. To guarantee the quality of augmented data, existing methods either assume no noise exists in the augmented data and adopt consistency training or use simple heuristics such as training loss and diversity constraints to filter out ``noisy'' data. However, those filtered examples may still contain useful information, and dropping them completely causes loss of supervision signals. In this paper, based on the assumption that the original dataset is cleaner than the augmented data, we propose an on-the-fly denoising technique for data augmentation that learns from soft augmented labels provided by an organic teacher model trained on the cleaner original data. A simple self-regularization module is applied to force the model prediction to be consistent across two distinct dropouts to further prevent overfitting on noisy labels. Our method can be applied to augmentation techniques in general and can consistently improve the performance on both text classification and question-answering tasks.

* 14 pages 
Viaarxiv icon

PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales

Nov 03, 2022
Peifeng Wang, Aaron Chan, Filip Ilievski, Muhao Chen, Xiang Ren

Figure 1 for PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales
Figure 2 for PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales
Figure 3 for PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales
Figure 4 for PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales

Neural language models (LMs) have achieved impressive results on various language-based reasoning tasks by utilizing latent knowledge encoded in their own pretrained parameters. To make this reasoning process more explicit, recent works retrieve a rationalizing LM's internal knowledge by training or prompting it to generate free-text rationales, which can be used to guide task predictions made by either the same LM or a separate reasoning LM. However, rationalizing LMs require expensive rationale annotation and/or computation, without any assurance that their generated rationales improve LM task performance or faithfully reflect LM decision-making. In this paper, we propose PINTO, an LM pipeline that rationalizes via prompt-based learning, and learns to faithfully reason over rationales via counterfactual regularization. First, PINTO maps out a suitable reasoning process for the task input by prompting a frozen rationalizing LM to generate a free-text rationale. Second, PINTO's reasoning LM is fine-tuned to solve the task using the generated rationale as context, while regularized to output less confident predictions when the rationale is perturbed. Across four datasets, we show that PINTO significantly improves the generalization ability of the reasoning LM, yielding higher performance on both in-distribution and out-of-distribution test sets. Also, we find that PINTO's rationales are more faithful to its task predictions than those generated by competitive baselines.

* 18 pages, 6 figures, preprint 
Viaarxiv icon