Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Junyi Jessy Li

Using Developer Discussions to Guide Fixing Bugs in Software

Nov 11, 2022
Sheena Panthaplackel, Milos Gligoric, Junyi Jessy Li, Raymond J. Mooney

Figure 1 for Using Developer Discussions to Guide Fixing Bugs in Software

Figure 2 for Using Developer Discussions to Guide Fixing Bugs in Software

Figure 3 for Using Developer Discussions to Guide Fixing Bugs in Software

Figure 4 for Using Developer Discussions to Guide Fixing Bugs in Software

Automatically fixing software bugs is a challenging task. While recent work showed that natural language context is useful in guiding bug-fixing models, the approach required prompting developers to provide this context, which was simulated through commit messages written after the bug-fixing code changes were made. We instead propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for any additional information from developers. For this, we augment standard bug-fixing datasets with bug report discussions. Using these newly compiled datasets, we demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.

* Accepted in the Findings of EMNLP 2022

Via

Access Paper or Ask Questions

Why Do You Feel This Way? Summarizing Triggers of Emotions in Social Media Posts

Oct 22, 2022
Hongli Zhan, Tiberiu Sosea, Cornelia Caragea, Junyi Jessy Li

Figure 1 for Why Do You Feel This Way? Summarizing Triggers of Emotions in Social Media Posts

Figure 2 for Why Do You Feel This Way? Summarizing Triggers of Emotions in Social Media Posts

Figure 3 for Why Do You Feel This Way? Summarizing Triggers of Emotions in Social Media Posts

Figure 4 for Why Do You Feel This Way? Summarizing Triggers of Emotions in Social Media Posts

Crises such as the COVID-19 pandemic continuously threaten our world and emotionally affect billions of people worldwide in distinct ways. Understanding the triggers leading to people's emotions is of crucial importance. Social media posts can be a good source of such analysis, yet these texts tend to be charged with multiple emotions, with triggers scattering across multiple sentences. This paper takes a novel angle, namely, emotion detection and trigger summarization, aiming to both detect perceived emotions in text, and summarize events and their appraisals that trigger each emotion. To support this goal, we introduce CovidET (Emotions and their Triggers during Covid-19), a dataset of ~1,900 English Reddit posts related to COVID-19, which contains manual annotations of perceived emotions and abstractive summaries of their triggers described in the post. We develop strong baselines to jointly detect emotions and summarize emotion triggers. Our analyses show that CovidET presents new challenges in emotion-specific summarization, as well as multi-emotion detection in long social media posts.

* EMNLP 2022 Camera Ready Version

Via

Access Paper or Ask Questions

Discourse Analysis via Questions and Answers: Parsing Dependency Structures of Questions Under Discussion

Oct 12, 2022
Wei-Jen Ko, Yating Wu, Cutter Dalton, Dananjay Srinivas, Greg Durrett, Junyi Jessy Li

Figure 1 for Discourse Analysis via Questions and Answers: Parsing Dependency Structures of Questions Under Discussion

Figure 2 for Discourse Analysis via Questions and Answers: Parsing Dependency Structures of Questions Under Discussion

Figure 3 for Discourse Analysis via Questions and Answers: Parsing Dependency Structures of Questions Under Discussion

Figure 4 for Discourse Analysis via Questions and Answers: Parsing Dependency Structures of Questions Under Discussion

Automatic discourse processing, which can help understand how sentences connect to each other, is bottlenecked by data: current discourse formalisms pose highly demanding annotation tasks involving large taxonomies of discourse relations, making them inaccessible to lay annotators. This work instead adopts the linguistic framework of Questions Under Discussion (QUD) for discourse analysis and seeks to derive QUD structures automatically. QUD views each sentence as an answer to a question triggered in prior context; thus, we characterize relationships between sentences as free-form questions, in contrast to exhaustive fine-grained taxonomies. We develop the first-of-its-kind QUD parser that derives a dependency structure of questions over full documents, trained using a large question-answering dataset DCQA annotated in a manner consistent with the QUD framework. Importantly, data collection is easily crowdsourced using DCQA's paradigm. We show that this leads to a parser attaining strong performance according to human evaluation. We illustrate how our QUD structure is distinct from RST trees, and demonstrate the utility of QUD analysis in the context of document simplification. Our findings show that QUD parsing is an appealing alternative for automatic discourse processing.

Via

Access Paper or Ask Questions

News Summarization and Evaluation in the Era of GPT-3

Sep 26, 2022
Tanya Goyal, Junyi Jessy Li, Greg Durrett

Figure 1 for News Summarization and Evaluation in the Era of GPT-3

Figure 2 for News Summarization and Evaluation in the Era of GPT-3

Figure 3 for News Summarization and Evaluation in the Era of GPT-3

Figure 4 for News Summarization and Evaluation in the Era of GPT-3

The recent success of zero- and few-shot prompting with models like GPT-3 has led to a paradigm shift in NLP research. In this paper, we study its impact on text summarization, focusing on the classic benchmark domain of news summarization. First, we investigate how zero-shot GPT-3 compares against fine-tuned models trained on large summarization datasets. We show that not only do humans overwhelmingly prefer GPT-3 summaries, but these also do not suffer from common dataset-specific issues such as poor factuality. Next, we study what this means for evaluation, particularly the role of gold standard test sets. Our experiments show that both reference-based and reference-free automatic metrics, e.g. recently proposed QA- or entailment-based factuality approaches, cannot reliably evaluate zero-shot summaries. Finally, we discuss future research challenges beyond generic summarization, specifically, keyword- and aspect-based summarization, showing how dominant fine-tuning approaches compare to zero-shot prompting. To support further research, we release: (a) a corpus of 10K generated summaries from fine-tuned and zero-shot models across 4 standard summarization benchmarks, (b) 1K human preference judgments and rationales comparing different systems for generic- and keyword-based summarization.

* All data shared at: https://tagoyal.github.io/zeroshot-news-annotations.html

Via

Access Paper or Ask Questions

Dimensions of Interpersonal Dynamics in Text: Group Membership and Fine-grained Interpersonal Emotion

Sep 14, 2022
Venkata S Govindarajan, Katherine Atwell, Barea Sinno, Malihe Alikhani, David I. Beaver, Junyi Jessy Li

Figure 1 for Dimensions of Interpersonal Dynamics in Text: Group Membership and Fine-grained Interpersonal Emotion

Figure 2 for Dimensions of Interpersonal Dynamics in Text: Group Membership and Fine-grained Interpersonal Emotion

Figure 3 for Dimensions of Interpersonal Dynamics in Text: Group Membership and Fine-grained Interpersonal Emotion

Figure 4 for Dimensions of Interpersonal Dynamics in Text: Group Membership and Fine-grained Interpersonal Emotion

The ability of language to perpetuate inequality is most evident when individuals refer to, or talk about, other individuals in their utterances. While current studies of bias in NLP rely mainly on identifying hate speech or bias towards a specific group, we believe we can reach a more subtle and nuanced understanding of the interaction between bias and language use by modeling the speaker, the text, and the target in the text. In this paper, we introduce a dataset of 3033 English tweets by US Congress members annotated for interpersonal emotion, and `found supervision' for interpersonal group membership labels. We find that negative emotions such as anger and disgust are used predominantly in out-group situations, and directed predominantly at leaders of opposite parties. While humans can perform better than chance at identifying interpersonal group membership given an utterance, neural models perform much better; furthermore, a shared encoding between interpersonal group membership and interpersonal perceived emotion enabled some performance gains in the latter. This work aims to re-align the study of bias in NLP away from specific instances of bias to one which encapsulates the relationship between speaker, text, target and social dynamics. Data and code for this paper are available at https://github.com/venkatasg/Interpersonal-Dynamics

Via

Access Paper or Ask Questions

Text Simplification of College Admissions Instructions: A Professionally Simplified and Verified Corpus

Sep 09, 2022
Zachary W. Taylor, Maximus H. Chu, Junyi Jessy Li

Figure 1 for Text Simplification of College Admissions Instructions: A Professionally Simplified and Verified Corpus

Figure 2 for Text Simplification of College Admissions Instructions: A Professionally Simplified and Verified Corpus

Figure 3 for Text Simplification of College Admissions Instructions: A Professionally Simplified and Verified Corpus

Figure 4 for Text Simplification of College Admissions Instructions: A Professionally Simplified and Verified Corpus

Access to higher education is critical for minority populations and emergent bilingual students. However, the language used by higher education institutions to communicate with prospective students is often too complex; concretely, many institutions in the US publish admissions application instructions far above the average reading level of a typical high school graduate, often near the 13th or 14th grade level. This leads to an unnecessary barrier between students and access to higher education. This work aims to tackle this challenge via text simplification. We present PSAT (Professionally Simplified Admissions Texts), a dataset with 112 admissions instructions randomly selected from higher education institutions across the US. These texts are then professionally simplified, and verified and accepted by subject-matter experts who are full-time employees in admissions offices at various institutions. Additionally, PSAT comes with manual alignments of 1,883 original-simplified sentence pairs. The result is a first-of-its-kind corpus for the evaluation and fine-tuning of text simplification systems in a high-stakes genre distinct from existing simplification resources.

* International Conference on Computational Linguistics (COLING) 2022

Via

Access Paper or Ask Questions

CoditT5: Pretraining for Source Code and Natural Language Editing

Aug 10, 2022
Jiyang Zhang, Sheena Panthaplackel, Pengyu Nie, Junyi Jessy Li, Milos Gligoric

Figure 1 for CoditT5: Pretraining for Source Code and Natural Language Editing

Figure 2 for CoditT5: Pretraining for Source Code and Natural Language Editing

Figure 3 for CoditT5: Pretraining for Source Code and Natural Language Editing

Figure 4 for CoditT5: Pretraining for Source Code and Natural Language Editing

Pretrained language models have been shown to be effective in many software-related generation tasks; however, they are not well-suited for editing tasks as they are not designed to reason about edits. To address this, we propose a novel pretraining objective which explicitly models edits and use it to build CoditT5, a large language model for software-related editing tasks that is pretrained on large amounts of source code and natural language comments. We fine-tune it on various downstream editing tasks, including comment updating, bug fixing, and automated code review. By outperforming pure generation-based models, we demonstrate the generalizability of our approach and its suitability for editing tasks. We also show how a pure generation model and our edit-based model can complement one another through simple reranking strategies, with which we achieve state-of-the-art performance for the three downstream editing tasks.

Via

Access Paper or Ask Questions

longhorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systematic approach to adversarial attacks

Jun 29, 2022
Venelin Kovatchev, Trina Chatterjee, Venkata S Govindarajan, Jifan Chen, Eunsol Choi, Gabriella Chronis, Anubrata Das, Katrin Erk, Matthew Lease, Junyi Jessy Li, Yating Wu, Kyle Mahowald

Figure 1 for longhorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systematic approach to adversarial attacks

Figure 2 for longhorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systematic approach to adversarial attacks

Figure 3 for longhorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systematic approach to adversarial attacks

Figure 4 for longhorns at DADC 2022: How many linguists does it take to fool a Question Answering model? A systematic approach to adversarial attacks

Developing methods to adversarially challenge NLP systems is a promising avenue for improving both model performance and interpretability. Here, we describe the approach of the team "longhorns" on Task 1 of the The First Workshop on Dynamic Adversarial Data Collection (DADC), which asked teams to manually fool a model on an Extractive Question Answering task. Our team finished first, with a model error rate of 62%. We advocate for a systematic, linguistically informed approach to formulating adversarial questions, and we describe the results of our pilot experiments, as well as our official submission.

* Accepted at DADC2022

Via

Access Paper or Ask Questions

SNaC: Coherence Error Detection for Narrative Summarization

May 19, 2022
Tanya Goyal, Junyi Jessy Li, Greg Durrett

Figure 1 for SNaC: Coherence Error Detection for Narrative Summarization

Figure 2 for SNaC: Coherence Error Detection for Narrative Summarization

Figure 3 for SNaC: Coherence Error Detection for Narrative Summarization

Figure 4 for SNaC: Coherence Error Detection for Narrative Summarization

Progress in summarizing long texts is inhibited by the lack of appropriate evaluation frameworks. When a long summary must be produced to appropriately cover the facets of that text, that summary needs to present a coherent narrative to be understandable by a reader, but current automatic and human evaluation methods fail to identify gaps in coherence. In this work, we introduce SNaC, a narrative coherence evaluation framework rooted in fine-grained annotations for long summaries. We develop a taxonomy of coherence errors in generated narrative summaries and collect span-level annotations for 6.6k sentences across 150 book and movie screenplay summaries. Our work provides the first characterization of coherence errors generated by state-of-the-art summarization models and a protocol for eliciting coherence judgments from crowd annotators. Furthermore, we show that the collected annotations allow us to train a strong classifier for automatically localizing coherence errors in generated summaries as well as benchmarking past work in coherence modeling. Finally, our SNaC framework can support future work in long document summarization and coherence evaluation, including improved summarization modeling and post-hoc summary correction.

* preprint

Via

Access Paper or Ask Questions

Evaluating Factuality in Text Simplification

Apr 15, 2022
Ashwin Devaraj, William Sheffield, Byron C. Wallace, Junyi Jessy Li

Figure 1 for Evaluating Factuality in Text Simplification

Figure 2 for Evaluating Factuality in Text Simplification

Figure 3 for Evaluating Factuality in Text Simplification

Figure 4 for Evaluating Factuality in Text Simplification

Automated simplification models aim to make input texts more readable. Such methods have the potential to make complex information accessible to a wider audience, e.g., providing access to recent medical literature which might otherwise be impenetrable for a lay reader. However, such models risk introducing errors into automatically simplified texts, for instance by inserting statements unsupported by the corresponding original text, or by omitting key information. Providing more readable but inaccurate versions of texts may in many cases be worse than providing no such access at all. The problem of factual accuracy (and the lack thereof) has received heightened attention in the context of summarization models, but the factuality of automatically simplified texts has not been investigated. We introduce a taxonomy of errors that we use to analyze both references drawn from standard simplification datasets and state-of-the-art model outputs. We find that errors often appear in both that are not captured by existing evaluation metrics, motivating a need for research into ensuring the factual accuracy of automated simplification models.

* ACL 2022

Via

Access Paper or Ask Questions