Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kentaro Inui

MBZUAI, Tohoku University, RIKEN

Think-to-Talk or Talk-to-Think? When LLMs Come Up with an Answer in Multi-Step Reasoning

Dec 02, 2024

Keito Kudo, Yoichi Aoki, Tatsuki Kuribayashi, Shusaku Sone, Masaya Taniguchi, Ana Brassard, Keisuke Sakaguchi, Kentaro Inui

Figure 1 for Think-to-Talk or Talk-to-Think? When LLMs Come Up with an Answer in Multi-Step Reasoning

Figure 2 for Think-to-Talk or Talk-to-Think? When LLMs Come Up with an Answer in Multi-Step Reasoning

Figure 3 for Think-to-Talk or Talk-to-Think? When LLMs Come Up with an Answer in Multi-Step Reasoning

Figure 4 for Think-to-Talk or Talk-to-Think? When LLMs Come Up with an Answer in Multi-Step Reasoning

Abstract:This study investigates the internal reasoning mechanism of language models during symbolic multi-step reasoning, motivated by the question of whether chain-of-thought (CoT) outputs are faithful to the model's internals. Specifically, we inspect when they internally determine their answers, particularly before or after CoT begins, to determine whether models follow a post-hoc "think-to-talk" mode or a step-by-step "talk-to-think" mode of explanation. Through causal probing experiments in controlled arithmetic reasoning tasks, we found systematic internal reasoning patterns across models; for example, simple subproblems are solved before CoT begins, and more complicated multi-hop calculations are performed during CoT.

Via

Access Paper or Ask Questions

The Geometry of Numerical Reasoning: Language Models Compare Numeric Properties in Linear Subspaces

Oct 17, 2024

Ahmed Oumar El-Shangiti, Tatsuya Hiraoka, Hilal AlQuabeh, Benjamin Heinzerling, Kentaro Inui

Abstract:This paper investigates whether large language models (LLMs) utilize numerical attributes encoded in a low-dimensional subspace of the embedding space when answering logical comparison questions (e.g., Was Cristiano born before Messi?). We first identified these subspaces using partial least squares regression, which effectively encodes the numerical attributes associated with the entities in comparison prompts. Further, we demonstrate causality by intervening in these subspaces to manipulate hidden states, thereby altering the LLM's comparison outcomes. Experimental results show that our findings hold for different numerical attributes, indicating that LLMs utilize the linearly encoded information for numerical reasoning.

Via

Access Paper or Ask Questions

Repetition Neurons: How Do Language Models Produce Repetitions?

Oct 17, 2024

Tatsuya Hiraoka, Kentaro Inui

Abstract:This paper introduces repetition neurons, regarded as skill neurons responsible for the repetition problem in text generation tasks. These neurons are progressively activated more strongly as repetition continues, indicating that they perceive repetition as a task to copy the previous context repeatedly, similar to in-context learning. We identify these repetition neurons by comparing activation values before and after the onset of repetition in texts generated by recent pre-trained language models. We analyze the repetition neurons in three English and one Japanese pre-trained language models and observe similar patterns across them.

Via

Access Paper or Ask Questions

Representational Analysis of Binding in Large Language Models

Sep 09, 2024

Qin Dai, Benjamin Heinzerling, Kentaro Inui

Abstract:Entity tracking is essential for complex reasoning. To perform in-context entity tracking, language models (LMs) must bind an entity to its attribute (e.g., bind a container to its content) to recall attribute for a given entity. For example, given a context mentioning ``The coffee is in Box Z, the stone is in Box M, the map is in Box H'', to infer ``Box Z contains the coffee'' later, LMs must bind ``Box Z'' to ``coffee''. To explain the binding behaviour of LMs, Feng and Steinhardt (2023) introduce a Binding ID mechanism and state that LMs use a abstract concept called Binding ID (BI) to internally mark entity-attribute pairs. However, they have not directly captured the BI determinant information from entity activations. In this work, we provide a novel view of the Binding ID mechanism by localizing the prototype of BI information. Specifically, we discover that there exists a low-rank subspace in the hidden state (or activation) of LMs, that primarily encodes the order of entity and attribute and which is used as the prototype of BI to causally determine the binding. To identify this subspace, we choose principle component analysis as our first attempt and it is empirically proven to be effective. Moreover, we also discover that when editing representations along directions in the subspace, LMs tend to bind a given entity to other attributes accordingly. For example, by patching activations along the BI encoding direction we can make the LM to infer ``Box Z contains the stone'' and ``Box Z contains the map''.

Via

Access Paper or Ask Questions

MQM-Chat: Multidimensional Quality Metrics for Chat Translation

Aug 29, 2024

Yunmeng Li, Jun Suzuki, Makoto Morishita, Kaori Abe, Kentaro Inui

Abstract:The complexities of chats pose significant challenges for machine translation models. Recognizing the need for a precise evaluation metric to address the issues of chat translation, this study introduces Multidimensional Quality Metrics for Chat Translation (MQM-Chat). Through the experiments of five models using MQM-Chat, we observed that all models generated certain fundamental errors, while each of them has different shortcomings, such as omission, overly correcting ambiguous source content, and buzzword issues, resulting in the loss of stylized information. Our findings underscore the effectiveness of MQM-Chat in evaluating chat translation, emphasizing the importance of stylized content and dialogue consistency for future studies.

Via

Access Paper or Ask Questions

An Investigation of Warning Erroneous Chat Translations in Cross-lingual Communication

Aug 28, 2024

Yunmeng Li, Jun Suzuki, Makoto Morishita, Kaori Abe, Kentaro Inui

Figure 1 for An Investigation of Warning Erroneous Chat Translations in Cross-lingual Communication

Figure 2 for An Investigation of Warning Erroneous Chat Translations in Cross-lingual Communication

Figure 3 for An Investigation of Warning Erroneous Chat Translations in Cross-lingual Communication

Figure 4 for An Investigation of Warning Erroneous Chat Translations in Cross-lingual Communication

* IJCNLP-AACL 2023 Student Research Workshop

Via

Access Paper or Ask Questions

Reducing the Cost: Cross-Prompt Pre-Finetuning for Short Answer Scoring

Aug 26, 2024

Hiroaki Funayama, Yuya Asazuma, Yuichiroh Matsubayashi, Tomoya Mizumoto, Kentaro Inui

Abstract:Automated Short Answer Scoring (SAS) is the task of automatically scoring a given input to a prompt based on rubrics and reference answers. Although SAS is useful in real-world applications, both rubrics and reference answers differ between prompts, thus requiring a need to acquire new data and train a model for each new prompt. Such requirements are costly, especially for schools and online courses where resources are limited and only a few prompts are used. In this work, we attempt to reduce this cost through a two-phase approach: train a model on existing rubrics and answers with gold score signals and finetune it on a new prompt. Specifically, given that scoring rubrics and reference answers differ for each prompt, we utilize key phrases, or representative expressions that the answer should contain to increase scores, and train a SAS model to learn the relationship between key phrases and answers using already annotated prompts (i.e., cross-prompts). Our experimental results show that finetuning on existing cross-prompt data with key phrases significantly improves scoring accuracy, especially when the training data is limited. Finally, our extensive analysis shows that it is crucial to design the model so that it can learn the task's general property.

* AIED 2023. Lecture Notes in Computer Science(), vol 13916.pp.78-89. Springer
* This is the draft submitted to AIED 2023. For the latest version, please visit: https://link.springer.com/chapter/10.1007/978-3-031-36272-9_7

Via

Access Paper or Ask Questions

First Heuristic Then Rational: Dynamic Use of Heuristics in Language Model Reasoning

Jun 23, 2024

Yoichi Aoki, Keito Kudo, Tatsuki Kuribayashi, Shusaku Sone, Masaya Taniguchi, Keisuke Sakaguchi, Kentaro Inui

Figure 1 for First Heuristic Then Rational: Dynamic Use of Heuristics in Language Model Reasoning

Figure 2 for First Heuristic Then Rational: Dynamic Use of Heuristics in Language Model Reasoning

Figure 3 for First Heuristic Then Rational: Dynamic Use of Heuristics in Language Model Reasoning

Figure 4 for First Heuristic Then Rational: Dynamic Use of Heuristics in Language Model Reasoning

Abstract:Multi-step reasoning is widely adopted in the community to explore the better performance of language models (LMs). We report on the systematic strategy that LMs use in this process. Our controlled experiments reveal that LMs rely more heavily on heuristics, such as lexical overlap, in the earlier stages of reasoning when more steps are required to reach an answer. Conversely, as LMs progress closer to the final answer, their reliance on heuristics decreases. This suggests that LMs track only a limited number of future steps and dynamically combine heuristic strategies with logical ones in tasks involving multi-step reasoning.

Via

Access Paper or Ask Questions

Flee the Flaw: Annotating the Underlying Logic of Fallacious Arguments Through Templates and Slot-filling

Jun 18, 2024

Irfan Robbani, Paul Reisert, Naoya Inoue, Surawat Pothong, Camélia Guerraoui, Wenzhi Wang, Shoichi Naito, Jungmin Choi, Kentaro Inui

Figure 1 for Flee the Flaw: Annotating the Underlying Logic of Fallacious Arguments Through Templates and Slot-filling

Figure 2 for Flee the Flaw: Annotating the Underlying Logic of Fallacious Arguments Through Templates and Slot-filling

Figure 3 for Flee the Flaw: Annotating the Underlying Logic of Fallacious Arguments Through Templates and Slot-filling

Figure 4 for Flee the Flaw: Annotating the Underlying Logic of Fallacious Arguments Through Templates and Slot-filling

Abstract:Prior research in computational argumentation has mainly focused on scoring the quality of arguments, with less attention on explicating logical errors. In this work, we introduce four sets of explainable templates for common informal logical fallacies designed to explicate a fallacy's implicit logic. Using our templates, we conduct an annotation study on top of 400 fallacious arguments taken from LOGIC dataset and achieve a high agreement score (Krippendorf's alpha of 0.54) and reasonable coverage (0.83). Finally, we conduct an experiment for detecting the structure of fallacies and discover that state-of-the-art language models struggle with detecting fallacy templates (0.47 accuracy). To facilitate research on fallacies, we make our dataset and guidelines publicly available.

Via

Access Paper or Ask Questions

The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models

Jun 10, 2024

Ryosuke Takahashi, Go Kamoda, Benjamin Heinzerling, Keisuke Sakaguchi, Kentaro Inui

Figure 1 for The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models

Figure 2 for The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models

Figure 3 for The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models

Figure 4 for The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models

Abstract:Language models (LMs) encode world knowledge in their internal parameters through training. However, LMs may learn personal and confidential information from the training data, leading to privacy concerns such as data leakage. Therefore, research on knowledge deletion from LMs is essential. This study focuses on the knowledge stored in LMs and analyzes the relationship between the side effects of knowledge deletion and the entities related to the knowledge. Our findings reveal that deleting knowledge related to popular entities can have catastrophic side effects. Furthermore, this research is the first to analyze knowledge deletion in models trained on synthetic knowledge graphs, indicating a new direction for controlled experiments.

Via

Access Paper or Ask Questions