Alert button
Picture for Arkadiy Saakyan

Arkadiy Saakyan

Alert button

ICLEF: In-Context Learning with Expert Feedback for Explainable Style Transfer

Sep 15, 2023
Arkadiy Saakyan, Smaranda Muresan

While state-of-the-art language models excel at the style transfer task, current work does not address explainability of style transfer systems. Explanations could be generated using large language models such as GPT-3.5 and GPT-4, but the use of such complex systems is inefficient when smaller, widely distributed, and transparent alternatives are available. We propose a framework to augment and improve a formality style transfer dataset with explanations via model distillation from ChatGPT. To further refine the generated explanations, we propose a novel way to incorporate scarce expert human feedback using in-context learning (ICLEF: In-Context Learning from Expert Feedback) by prompting ChatGPT to act as a critic to its own outputs. We use the resulting dataset of 9,960 explainable formality style transfer instances (e-GYAFC) to show that current openly distributed instruction-tuned models (and, in some settings, ChatGPT) perform poorly on the task, and that fine-tuning on our high-quality dataset leads to significant improvements as shown by automatic evaluation. In human evaluation, we show that models much smaller than ChatGPT fine-tuned on our data align better with expert preferences. Finally, we discuss two potential applications of models fine-tuned on the explainable style transfer task: interpretable authorship verification and interpretable adversarial attacks on AI-generated text detectors.

Viaarxiv icon

I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors

May 24, 2023
Tuhin Chakrabarty, Arkadiy Saakyan, Olivia Winn, Artemis Panagopoulou, Yue Yang, Marianna Apidianaki, Smaranda Muresan

Figure 1 for I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors
Figure 2 for I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors
Figure 3 for I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors
Figure 4 for I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors

Visual metaphors are powerful rhetorical devices used to persuade or communicate creative ideas through images. Similar to linguistic metaphors, they convey meaning implicitly through symbolism and juxtaposition of the symbols. We propose a new task of generating visual metaphors from linguistic metaphors. This is a challenging task for diffusion-based text-to-image models, such as DALL$\cdot$E 2, since it requires the ability to model implicit meaning and compositionality. We propose to solve the task through the collaboration between Large Language Models (LLMs) and Diffusion Models: Instruct GPT-3 (davinci-002) with Chain-of-Thought prompting generates text that represents a visual elaboration of the linguistic metaphor containing the implicit meaning and relevant objects, which is then used as input to the diffusion-based text-to-image models.Using a human-AI collaboration framework, where humans interact both with the LLM and the top-performing diffusion model, we create a high-quality dataset containing 6,476 visual metaphors for 1,540 linguistic metaphors and their associated visual elaborations. Evaluation by professional illustrators shows the promise of LLM-Diffusion Model collaboration for this task.To evaluate the utility of our Human-AI collaboration framework and the quality of our dataset, we perform both an intrinsic human-based evaluation and an extrinsic evaluation using visual entailment as a downstream task.

* ACL 2023 (Findings) 
Viaarxiv icon

Sociocultural Norm Similarities and Differences via Situational Alignment and Explainable Textual Entailment

May 23, 2023
Sky CH-Wang, Arkadiy Saakyan, Oliver Li, Zhou Yu, Smaranda Muresan

Figure 1 for Sociocultural Norm Similarities and Differences via Situational Alignment and Explainable Textual Entailment
Figure 2 for Sociocultural Norm Similarities and Differences via Situational Alignment and Explainable Textual Entailment
Figure 3 for Sociocultural Norm Similarities and Differences via Situational Alignment and Explainable Textual Entailment
Figure 4 for Sociocultural Norm Similarities and Differences via Situational Alignment and Explainable Textual Entailment

Designing systems that can reason across cultures requires that they are grounded in the norms of the contexts in which they operate. However, current research on developing computational models of social norms has primarily focused on American society. Here, we propose a novel approach to discover and compare descriptive social norms across Chinese and American cultures. We demonstrate our approach by leveraging discussions on a Chinese Q&A platform-Zhihu-and the existing SocialChemistry dataset as proxies for contrasting cultural axes, align social situations cross-culturally, and extract social norms from texts using in-context learning. Embedding Chain-of-Thought prompting in a human-AI collaborative framework, we build a high-quality dataset of 3,069 social norms aligned with social situations across Chinese and American cultures alongside corresponding free-text explanations. To test the ability of models to reason about social norms across cultures, we introduce the task of explainable social norm entailment, showing that existing models under 3B parameters have significant room for improvement in both automatic and human evaluation. Further analysis of cross-cultural norm differences based on our dataset shows empirical alignment with the social orientations framework, revealing several situational and descriptive nuances in norms across these cultures.

Viaarxiv icon

FLUTE: Figurative Language Understanding and Textual Explanations

May 24, 2022
Tuhin Chakrabarty, Arkadiy Saakyan, Debanjan Ghosh, Smaranda Muresan

Figure 1 for FLUTE: Figurative Language Understanding and Textual Explanations
Figure 2 for FLUTE: Figurative Language Understanding and Textual Explanations
Figure 3 for FLUTE: Figurative Language Understanding and Textual Explanations
Figure 4 for FLUTE: Figurative Language Understanding and Textual Explanations

In spite of the prevalence of figurative language, transformer-based models struggle to demonstrate an understanding of it. Meanwhile, even classical natural language inference (NLI) tasks have been plagued by spurious correlations and annotation artifacts. Datasets like eSNLI have been released, allowing to probe whether language models are right for the right reasons. Yet no such data exists for figurative language, making it harder to asses genuine understanding of such expressions. In light of the above, we release FLUTE, a dataset of 8,000 figurative NLI instances with explanations, spanning three categories: Sarcasm, Simile, and Metaphor. We collect the data through the Human-AI collaboration framework based on GPT-3, crowdworkers, and expert annotation. We show how utilizing GPT-3 in conjunction with human experts can aid in scaling up the creation of datasets even for such complex linguistic phenomena as figurative language. Baseline performance of the T5 model shows our dataset is a challenging testbed for figurative language understanding.

* Work in progress 
Viaarxiv icon

Don't Go Far Off: An Empirical Study on Neural Poetry Translation

Sep 10, 2021
Tuhin Chakrabarty, Arkadiy Saakyan, Smaranda Muresan

Figure 1 for Don't Go Far Off: An Empirical Study on Neural Poetry Translation
Figure 2 for Don't Go Far Off: An Empirical Study on Neural Poetry Translation
Figure 3 for Don't Go Far Off: An Empirical Study on Neural Poetry Translation
Figure 4 for Don't Go Far Off: An Empirical Study on Neural Poetry Translation

Despite constant improvements in machine translation quality, automatic poetry translation remains a challenging problem due to the lack of open-sourced parallel poetic corpora, and to the intrinsic complexities involved in preserving the semantics, style, and figurative nature of poetry. We present an empirical investigation for poetry translation along several dimensions: 1) size and style of training data (poetic vs. non-poetic), including a zero-shot setup; 2) bilingual vs. multilingual learning; and 3) language-family-specific models vs. mixed-multilingual models. To accomplish this, we contribute a parallel dataset of poetry translations for several language pairs. Our results show that multilingual fine-tuning on poetic text significantly outperforms multilingual fine-tuning on non-poetic text that is 35X larger in size, both in terms of automatic metrics (BLEU, BERTScore) and human evaluation metrics such as faithfulness (meaning and poetic style). Moreover, multilingual fine-tuning on poetic data outperforms \emph{bilingual} fine-tuning on poetic data.

* EMNLP 2021 Camera ready 
Viaarxiv icon

COVID-Fact: Fact Extraction and Verification of Real-World Claims on COVID-19 Pandemic

Jun 07, 2021
Arkadiy Saakyan, Tuhin Chakrabarty, Smaranda Muresan

Figure 1 for COVID-Fact: Fact Extraction and Verification of Real-World Claims on COVID-19 Pandemic
Figure 2 for COVID-Fact: Fact Extraction and Verification of Real-World Claims on COVID-19 Pandemic
Figure 3 for COVID-Fact: Fact Extraction and Verification of Real-World Claims on COVID-19 Pandemic
Figure 4 for COVID-Fact: Fact Extraction and Verification of Real-World Claims on COVID-19 Pandemic

We introduce a FEVER-like dataset COVID-Fact of $4,086$ claims concerning the COVID-19 pandemic. The dataset contains claims, evidence for the claims, and contradictory claims refuted by the evidence. Unlike previous approaches, we automatically detect true claims and their source articles and then generate counter-claims using automatic methods rather than employing human annotators. Along with our constructed resource, we formally present the task of identifying relevant evidence for the claims and verifying whether the evidence refutes or supports a given claim. In addition to scientific claims, our data contains simplified general claims from media sources, making it better suited for detecting general misinformation regarding COVID-19. Our experiments indicate that COVID-Fact will provide a challenging testbed for the development of new systems and our approach will reduce the costs of building domain-specific datasets for detecting misinformation.

* ACL 2021 Camera Ready 
Viaarxiv icon