Alert button
Picture for Yanda Chen

Yanda Chen

Alert button

Department of Computer Science, Columbia University

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

Jul 17, 2023
Yanda Chen, Ruiqi Zhong, Narutatsu Ri, Chen Zhao, He He, Jacob Steinhardt, Zhou Yu, Kathleen McKeown

Figure 1 for Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Figure 2 for Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Figure 3 for Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Figure 4 for Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

Large language models (LLMs) are trained to imitate humans to explain human decisions. However, do LLMs explain themselves? Can they help humans build mental models of how LLMs process different inputs? To answer these questions, we propose to evaluate $\textbf{counterfactual simulatability}$ of natural language explanations: whether an explanation can enable humans to precisely infer the model's outputs on diverse counterfactuals of the explained input. For example, if a model answers "yes" to the input question "Can eagles fly?" with the explanation "all birds can fly", then humans would infer from the explanation that it would also answer "yes" to the counterfactual input "Can penguins fly?". If the explanation is precise, then the model's answer should match humans' expectations. We implemented two metrics based on counterfactual simulatability: precision and generality. We generated diverse counterfactuals automatically using LLMs. We then used these metrics to evaluate state-of-the-art LLMs (e.g., GPT-4) on two tasks: multi-hop factual reasoning and reward modeling. We found that LLM's explanations have low precision and that precision does not correlate with plausibility. Therefore, naively optimizing human approvals (e.g., RLHF) may not be a sufficient solution.

Viaarxiv icon

In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models

Dec 20, 2022
Yukun Huang, Yanda Chen, Zhou Yu, Kathleen McKeown

Figure 1 for In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models
Figure 2 for In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models
Figure 3 for In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models
Figure 4 for In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models

Given the success with in-context learning of large pre-trained language models, we introduce in-context learning distillation to transfer in-context few-shot learning ability from large models to smaller models. We propose to combine in-context learning objectives with language modeling objectives to distill both the ability to read in-context examples and task knowledge to the smaller models. We perform in-context learning distillation under two different few-shot learning paradigms: Meta In-context Tuning (Meta-ICT) and Multitask In-context Tuning (Multitask-ICT). Multitask-ICT performs better on multitask few-shot learning but also requires more computation than Meta-ICT. Our method shows consistent improvements for both Meta-ICT and Multitask-ICT on two benchmarks: LAMA and CrossFit. Our extensive experiments and analysis reveal that in-context learning objectives and language modeling objectives are complementary under the Multitask-ICT paradigm. In-context learning objectives achieve the best performance when combined with language modeling objectives.

Viaarxiv icon

On the Relation between Sensitivity and Accuracy in In-context Learning

Sep 16, 2022
Yanda Chen, Chen Zhao, Zhou Yu, Kathleen McKeown, He He

Figure 1 for On the Relation between Sensitivity and Accuracy in In-context Learning
Figure 2 for On the Relation between Sensitivity and Accuracy in In-context Learning
Figure 3 for On the Relation between Sensitivity and Accuracy in In-context Learning
Figure 4 for On the Relation between Sensitivity and Accuracy in In-context Learning

In-context learning (ICL) suffers from oversensitivity to the prompt, which makes it unreliable in real-world scenarios. We study the sensitivity of ICL with respect to multiple types of perturbations. First, we find that label bias obscures true ICL sensitivity, and hence prior work may have significantly underestimated the true ICL sensitivity. Second, we observe a strong negative correlation between ICL sensitivity and accuracy, with sensitive predictions less likely to be correct. Motivated by these observations, we propose \textsc{SenSel}, a few-shot selective prediction method based on ICL sensitivity. Experiments on ten classification benchmarks show that \textsc{SenSel} consistently outperforms a commonly used confidence-based selective prediction baseline.

Viaarxiv icon

Meta-learning via Language Model In-context Tuning

Oct 15, 2021
Yanda Chen, Ruiqi Zhong, Sheng Zha, George Karypis, He He

Figure 1 for Meta-learning via Language Model In-context Tuning
Figure 2 for Meta-learning via Language Model In-context Tuning
Figure 3 for Meta-learning via Language Model In-context Tuning
Figure 4 for Meta-learning via Language Model In-context Tuning

The goal of meta-learning is to learn to adapt to a new task with only a few labeled examples. To tackle this problem in NLP, we propose $\textit{in-context tuning}$, which recasts adaptation and prediction as a simple sequence prediction problem: to form the input sequence, we concatenate the task instruction, the labeled examples, and the target input to predict; to meta-train the model to learn from in-context examples, we fine-tune a pre-trained language model (LM) to predict the target label from the input sequences on a collection of tasks. We benchmark our method on two collections of text classification tasks: LAMA and BinaryClfs. Compared to first-order MAML which adapts the model with gradient descent, our method better leverages the inductive bias of LMs to perform pattern matching, and outperforms MAML by an absolute $6\%$ AUC ROC score on BinaryClfs, with increasing advantage w.r.t. model size. Compared to non-fine-tuned in-context learning (i.e. prompting a raw LM), in-context tuning directly learns to learn from in-context examples. On BinaryClfs, in-context tuning improves the average AUC-ROC score by an absolute $10\%$, and reduces the variance with respect to example ordering by 6x and example choices by 2x.

Viaarxiv icon

Cross-language Sentence Selection via Data Augmentation and Rationale Training

Jun 04, 2021
Yanda Chen, Chris Kedzie, Suraj Nair, Petra Galuščáková, Rui Zhang, Douglas W. Oard, Kathleen McKeown

Figure 1 for Cross-language Sentence Selection via Data Augmentation and Rationale Training
Figure 2 for Cross-language Sentence Selection via Data Augmentation and Rationale Training
Figure 3 for Cross-language Sentence Selection via Data Augmentation and Rationale Training
Figure 4 for Cross-language Sentence Selection via Data Augmentation and Rationale Training

This paper proposes an approach to cross-language sentence selection in a low-resource setting. It uses data augmentation and negative sampling techniques on noisy parallel sentence data to directly learn a cross-lingual embedding-based query relevance model. Results show that this approach performs as well as or better than multiple state-of-the-art machine translation + monolingual retrieval systems trained on the same parallel data. Moreover, when a rationale training secondary objective is applied to encourage the model to match word alignment hints from a phrase-based statistical machine translation model, consistent improvements are seen across three language pairs (English-Somali, English-Swahili and English-Tagalog) over a variety of state-of-the-art baselines.

* ACL 2021 main conference 
Viaarxiv icon

Improved Synthetic Training for Reading Comprehension

Oct 24, 2020
Yanda Chen, Md Arafat Sultan, Vittorio Castelli

Figure 1 for Improved Synthetic Training for Reading Comprehension
Figure 2 for Improved Synthetic Training for Reading Comprehension
Figure 3 for Improved Synthetic Training for Reading Comprehension
Figure 4 for Improved Synthetic Training for Reading Comprehension

Automatically generated synthetic training examples have been shown to improve performance in machine reading comprehension (MRC). Compared to human annotated gold standard data, synthetic training data has unique properties, such as high availability at the possible expense of quality. In view of such differences, in this paper, we explore novel applications of synthetic examples to MRC. Our proposed pre-training and knowledge distillation strategies show significant improvements over existing methods. In a particularly surprising discovery, we observe that synthetic distillation often yields students that can outperform the teacher model.

* 11 pages, 2 figures 
Viaarxiv icon

Detecting and Reducing Bias in a High Stakes Domain

Aug 29, 2019
Ruiqi Zhong, Yanda Chen, Desmond Patton, Charlotte Selous, Kathy McKeown

Figure 1 for Detecting and Reducing Bias in a High Stakes Domain
Figure 2 for Detecting and Reducing Bias in a High Stakes Domain
Figure 3 for Detecting and Reducing Bias in a High Stakes Domain
Figure 4 for Detecting and Reducing Bias in a High Stakes Domain

Gang-involved youth in cities such as Chicago sometimes post on social media to express their aggression towards rival gangs and previous research has demonstrated that a deep learning approach can predict aggression and loss in posts. To address the possibility of bias in this sensitive application, we developed an approach to systematically interpret the state of the art model. We found, surprisingly, that it frequently bases its predictions on stop words such as "a" or "on", an approach that could harm social media users who have no aggressive intentions. To tackle this bias, domain experts annotated the rationales, highlighting words that explain why a tweet is labeled as "aggression". These new annotations enable us to quantitatively measure how justified the model predictions are, and build models that drastically reduce bias. Our study shows that in high stake scenarios, accuracy alone cannot guarantee a good system and we need new evaluation methods.

Viaarxiv icon