Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dongyeop Kang

UC Berkeley

BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence

Dec 28, 2023

Zhecheng Sheng, Tianhao Zhang, Chen Jiang, Dongyeop Kang

Figure 1 for BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence

Figure 2 for BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence

Figure 3 for BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence

Figure 4 for BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence

Abstract:Measuring the coherence of text is a vital aspect of evaluating the quality of written content. Recent advancements in neural coherence modeling have demonstrated their efficacy in capturing entity coreference and discourse relations, thereby enhancing coherence evaluation. However, many existing methods heavily depend on static embeddings or focus narrowly on nearby context, constraining their capacity to measure the overarching coherence of long texts. In this paper, we posit that coherent texts inherently manifest a sequential and cohesive interplay among sentences, effectively conveying the central theme, purpose, or standpoint. To explore this abstract relationship, we introduce the "BBScore," a novel reference-free metric grounded in Brownian bridge theory for assessing text coherence. Our findings showcase that when synergized with a simple additional classification component, this metric attains a performance level comparable to state-of-the-art techniques on standard artificial discrimination tasks. We also establish in downstream tasks that this metric effectively differentiates between human-written documents and text generated by large language models under a specific domain. Furthermore, we illustrate the efficacy of this approach in detecting written styles attributed to diverse large language models, underscoring its potential for generalizability. In summary, we present a novel Brownian bridge coherence metric capable of measuring both local and global text coherence, while circumventing the need for end-to-end model training. This flexibility allows for its application in various downstream tasks.

* Accepted to the 38th Annual AAAI Conference on Artificial Intelligence (AAAI-24)

Via

Access Paper or Ask Questions

How Far Can We Extract Diverse Perspectives from Large Language Models? Criteria-Based Diversity Prompting!

Nov 16, 2023

Shirley Anugrah Hayati, Minhwa Lee, Dheeraj Rajagopal, Dongyeop Kang

Figure 1 for How Far Can We Extract Diverse Perspectives from Large Language Models? Criteria-Based Diversity Prompting!

Figure 2 for How Far Can We Extract Diverse Perspectives from Large Language Models? Criteria-Based Diversity Prompting!

Figure 3 for How Far Can We Extract Diverse Perspectives from Large Language Models? Criteria-Based Diversity Prompting!

Figure 4 for How Far Can We Extract Diverse Perspectives from Large Language Models? Criteria-Based Diversity Prompting!

Abstract:Collecting diverse human data on subjective NLP topics is costly and challenging. As Large Language Models (LLMs) have developed human-like capabilities, there is a recent trend in collaborative efforts between humans and LLMs for generating diverse data, offering potential scalable and efficient solutions. However, the extent of LLMs' capability to generate diverse perspectives on subjective topics remains an unexplored question. In this study, we investigate LLMs' capacity for generating diverse perspectives and rationales on subjective topics, such as social norms and argumentative texts. We formulate this problem as diversity extraction in LLMs and propose a criteria-based prompting technique to ground diverse opinions and measure perspective diversity from the generated criteria words. Our results show that measuring semantic diversity through sentence embeddings and distance metrics is not enough to measure perspective diversity. To see how far we can extract diverse perspectives from LLMs, or called diversity coverage, we employ a step-by-step recall prompting for generating more outputs from the model in an iterative manner. As we apply our prompting method to other tasks (hate speech labeling and story continuation), indeed we find that LLMs are able to generate diverse opinions according to the degree of task subjectivity.

* NLP

Via

Access Paper or Ask Questions

Which Modality should I use -- Text, Motif, or Image? : Understanding Graphs with Large Language Models

Nov 16, 2023

Debarati Das, Ishaan Gupta, Jaideep Srivastava, Dongyeop Kang

Figure 1 for Which Modality should I use -- Text, Motif, or Image? : Understanding Graphs with Large Language Models

Figure 2 for Which Modality should I use -- Text, Motif, or Image? : Understanding Graphs with Large Language Models

Figure 3 for Which Modality should I use -- Text, Motif, or Image? : Understanding Graphs with Large Language Models

Figure 4 for Which Modality should I use -- Text, Motif, or Image? : Understanding Graphs with Large Language Models

Abstract:Large language models (LLMs) are revolutionizing various fields by leveraging large text corpora for context-aware intelligence. Due to the context size, however, encoding an entire graph with LLMs is fundamentally limited. This paper explores how to better integrate graph data with LLMs and presents a novel approach using various encoding modalities (e.g., text, image, and motif) and approximation of global connectivity of a graph using different prompting methods to enhance LLMs' effectiveness in handling complex graph structures. The study also introduces GraphTMI, a new benchmark for evaluating LLMs in graph structure analysis, focusing on factors such as homophily, motif presence, and graph difficulty. Key findings reveal that image modality, supported by advanced vision-language models like GPT-4V, is more effective than text in managing token limits while retaining critical information. The research also examines the influence of different factors on each encoding modality's performance. This study highlights the current limitations and charts future directions for LLMs in graph understanding and reasoning tasks.

Via

Access Paper or Ask Questions

Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents

Oct 22, 2023

Hyungjoo Chae, Yongho Song, Kai Tzu-iunn Ong, Taeyoon Kwon, Minjin Kim, Youngjae Yu, Dongha Lee, Dongyeop Kang, Jinyoung Yeo

Figure 1 for Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents

Figure 2 for Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents

Figure 3 for Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents

Figure 4 for Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents

Abstract:Human-like chatbots necessitate the use of commonsense reasoning in order to effectively comprehend and respond to implicit information present within conversations. Achieving such coherence and informativeness in responses, however, is a non-trivial task. Even for large language models (LLMs), the task of identifying and aggregating key evidence within a single hop presents a substantial challenge. This complexity arises because such evidence is scattered across multiple turns in a conversation, thus necessitating integration over multiple hops. Hence, our focus is to facilitate such multi-hop reasoning over a dialogue context, namely dialogue chain-of-thought (CoT) reasoning. To this end, we propose a knowledge distillation framework that leverages LLMs as unreliable teachers and selectively distills consistent and helpful rationales via alignment filters. We further present DOCTOR, a DialOgue Chain-of-ThOught Reasoner that provides reliable CoT rationales for response generation. We conduct extensive experiments to show that enhancing dialogue agents with high-quality rationales from DOCTOR significantly improves the quality of their responses.

* 25 pages, 8 figures, Accepted to EMNLP 2023

Via

Access Paper or Ask Questions

Benchmarking Cognitive Biases in Large Language Models as Evaluators

Sep 29, 2023

Ryan Koo, Minhwa Lee, Vipul Raheja, Jong Inn Park, Zae Myung Kim, Dongyeop Kang

Figure 1 for Benchmarking Cognitive Biases in Large Language Models as Evaluators

Figure 2 for Benchmarking Cognitive Biases in Large Language Models as Evaluators

Figure 3 for Benchmarking Cognitive Biases in Large Language Models as Evaluators

Figure 4 for Benchmarking Cognitive Biases in Large Language Models as Evaluators

Abstract:Large Language Models (LLMs) have recently been shown to be effective as automatic evaluators with simple prompting and in-context learning. In this work, we assemble 15 LLMs of four different size ranges and evaluate their output responses by preference ranking from the other LLMs as evaluators, such as System Star is better than System Square. We then evaluate the quality of ranking outputs introducing the Cognitive Bias Benchmark for LLMs as Evaluators (CoBBLEr), a benchmark to measure six different cognitive biases in LLM evaluation outputs, such as the Egocentric bias where a model prefers to rank its own outputs highly in evaluation. We find that LLMs are biased text quality evaluators, exhibiting strong indications on our bias benchmark (average of 40% of comparisons across all models) within each of their evaluations that question their robustness as evaluators. Furthermore, we examine the correlation between human and machine preferences and calculate the average Rank-Biased Overlap (RBO) score to be 49.6%, indicating that machine preferences are misaligned with humans. According to our findings, LLMs may still be unable to be utilized for automatic annotation aligned with human preferences. Our project page is at: https://minnesotanlp.github.io/cobbler.

* Under review at ICLR 2024. 26 pages, 8 figures, 7 tables

Via

Access Paper or Ask Questions

Story Visualization by Online Text Augmentation with Context Memory

Aug 19, 2023

Daechul Ahn, Daneul Kim, Gwangmo Song, Seung Hwan Kim, Honglak Lee, Dongyeop Kang, Jonghyun Choi

Figure 1 for Story Visualization by Online Text Augmentation with Context Memory

Figure 2 for Story Visualization by Online Text Augmentation with Context Memory

Figure 3 for Story Visualization by Online Text Augmentation with Context Memory

Figure 4 for Story Visualization by Online Text Augmentation with Context Memory

Abstract:Story visualization (SV) is a challenging text-to-image generation task for the difficulty of not only rendering visual details from the text descriptions but also encoding a long-term context across multiple sentences. While prior efforts mostly focus on generating a semantically relevant image for each sentence, encoding a context spread across the given paragraph to generate contextually convincing images (e.g., with a correct character or with a proper background of the scene) remains a challenge. To this end, we propose a novel memory architecture for the Bi-directional Transformer framework with an online text augmentation that generates multiple pseudo-descriptions as supplementary supervision during training for better generalization to the language variation at inference. In extensive experiments on the two popular SV benchmarks, i.e., the Pororo-SV and Flintstones-SV, the proposed method significantly outperforms the state of the arts in various metrics including FID, character F1, frame accuracy, BLEU-2/3, and R-precision with similar or less computational complexity.

* ICCV 2023, Project page: https://dcahn12.github.io/projects/CMOTA/

Via

Access Paper or Ask Questions

infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information

Jun 12, 2023

Jaehyung Kim, Yekyung Kim, Karin de Langis, Jinwoo Shin, Dongyeop Kang

Figure 1 for infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information

Figure 2 for infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information

Figure 3 for infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information

Figure 4 for infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information

Abstract:The success of NLP systems often relies on the availability of large, high-quality datasets. However, not all samples in these datasets are equally valuable for learning, as some may be redundant or noisy. Several methods for characterizing datasets based on model-driven meta-information (e.g., model's confidence) have been developed, but the relationship and complementary effects of these methods have received less attention. In this paper, we introduce infoVerse, a universal framework for dataset characterization, which provides a new feature space that effectively captures multidimensional characteristics of datasets by incorporating various model-driven meta-information. infoVerse reveals distinctive regions of the dataset that are not apparent in the original semantic space, hence guiding users (or models) in identifying which samples to focus on for exploration, assessment, or annotation. Additionally, we propose a novel sampling method on infoVerse to select a set of data points that maximizes informativeness. In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines in all applications. Our code and demo are publicly available.

* 26 pages, accepted at ACL 2023 conference as a full paper

Via

Access Paper or Ask Questions

Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning

Jun 08, 2023

Jaehyung Kim, Jinwoo Shin, Dongyeop Kang

Figure 1 for Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning

Figure 2 for Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning

Figure 3 for Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning

Figure 4 for Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning

Abstract:The development of largely human-annotated benchmarks has driven the success of deep neural networks in various NLP tasks. To enhance the effectiveness of existing benchmarks, collecting new additional input-output pairs is often too costly and challenging, particularly considering their marginal impact on improving the current model accuracy. Instead, additional or complementary annotations on the existing input texts in the benchmarks can be preferable as an efficient way to pay the additional human cost. In this paper, we investigate task-specific preferences between pairs of input texts as a new alternative way for such auxiliary data annotation. From 'pair-wise' comparisons with respect to the task, the auxiliary preference learning enables the model to learn an additional informative training signal that cannot be captured with 'instance-wise' task labels. To this end, we propose a novel multi-task learning framework, called prefer-to-classify (P2C), which can enjoy the cooperative effect of learning both the given classification task and the auxiliary preferences. Here, we provide three different ways to collect preference signals in practice: (a) implicitly extracting from annotation records (for free, but often unavailable), (b) collecting explicitly from crowd workers (high paid), or (c) pre-trained large language models such as GPT-3 (low paid). Given existing classification NLP benchmarks, we demonstrate that the proposed auxiliary preference learning via P2C on them is effective in improving text classifiers. Our codes are publicly available.

* 22 pages, accepted at ICML 2023

Via

Access Paper or Ask Questions

An Analysis of Reader Engagement in Literary Fiction through Eye Tracking and Linguistic Features

Jun 06, 2023

Rose Neis, Karin de Langis, Zae Myung Kim, Dongyeop Kang

Abstract:Capturing readers' engagement in fiction is a challenging but important aspect of narrative understanding. In this study, we collected 23 readers' reactions to 2 short stories through eye tracking, sentence-level annotations, and an overall engagement scale survey. We analyzed the significance of various qualities of the text in predicting how engaging a reader is likely to find it. As enjoyment of fiction is highly contextual, we also investigated individual differences in our data. Furthering our understanding of what captivates readers in fiction will help better inform models used in creative narrative generation and collaborative writing tools.

* 9 pages, 4 figures

Via

Access Paper or Ask Questions

Complex Mathematical Symbol Definition Structures: A Dataset and Model for Coordination Resolution in Definition Extraction

May 24, 2023

Anna Martin-Boyle, Andrew Head, Kyle Lo, Risham Sidhu, Marti A. Hearst, Dongyeop Kang

Abstract:Mathematical symbol definition extraction is important for improving scholarly reading interfaces and scholarly information extraction (IE). However, the task poses several challenges: math symbols are difficult to process as they are not composed of natural language morphemes; and scholarly papers often contain sentences that require resolving complex coordinate structures. We present SymDef, an English language dataset of 5,927 sentences from full-text scientific papers where each sentence is annotated with all mathematical symbols linked with their corresponding definitions. This dataset focuses specifically on complex coordination structures such as "respectively" constructions, which often contain overlapping definition spans. We also introduce a new definition extraction method that masks mathematical symbols, creates a copy of each sentence for each symbol, specifies a target symbol, and predicts its corresponding definition spans using slot filling. Our experiments show that our definition extraction model significantly outperforms RoBERTa and other strong IE baseline systems by 10.9 points with a macro F1 score of 84.82. With our dataset and model, we can detect complex definitions in scholarly documents to make scientific writing more readable.

* 9 pages, 4 figures

Via

Access Paper or Ask Questions