Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yoav Artzi

Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs

Aug 02, 2024

Yilun Hua, Yoav Artzi

Abstract:Humans spontaneously use increasingly efficient language as interactions progress, by adapting and forming ad-hoc conventions. This phenomenon has been studied extensively using reference games, showing properties of human language that go beyond relaying intents. It remains unexplored whether multimodal large language models (MLLMs) similarly increase communication efficiency during interactions, and what mechanisms they may adopt for this purpose. We introduce ICCA, an automated framework to evaluate such conversational adaptation as an in-context behavior in MLLMs. We evaluate several state-of-the-art MLLMs, and observe that while they may understand the increasingly efficient language of their interlocutor, they do not spontaneously make their own language more efficient over time. This latter ability can only be elicited in some models (e.g., GPT-4) with heavy-handed prompting. This shows that this property of linguistic interaction does not arise from current training regimes, even though it is a common hallmark of human language. ICCA is available at https://github.com/lil-lab/ICCA.

* Accepted to COLM 2024

Via

Access Paper or Ask Questions

A Surprising Failure? Multimodal LLMs and the NLVR Challenge

Feb 26, 2024

Anne Wu, Kianté Brantley, Yoav Artzi

Abstract:This study evaluates three state-of-the-art MLLMs -- GPT-4V, Gemini Pro, and the open-source model IDEFICS -- on the compositional natural language vision reasoning task NLVR. Given a human-written sentence paired with a synthetic image, this task requires the model to determine the truth value of the sentence with respect to the image. Despite the strong performance demonstrated by these models, we observe they perform poorly on NLVR, which was constructed to require compositional and spatial reasoning, and to be robust for semantic and systematic biases.

Via

Access Paper or Ask Questions

A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models

Sep 06, 2023

Noriyuki Kojima, Hadar Averbuch-Elor, Yoav Artzi

Abstract:Key to tasks that require reasoning about natural language in visual contexts is grounding words and phrases to image regions. However, observing this grounding in contemporary models is complex, even if it is generally expected to take place if the task is addressed in a way that is conductive to generalization. We propose a framework to jointly study task performance and phrase grounding, and propose three benchmarks to study the relation between the two. Our results show that contemporary models demonstrate inconsistency between their ability to ground phrases and solve tasks. We show how this can be addressed through brute-force training on ground phrasing annotations, and analyze the dynamics it creates. Code and at available at https://github.com/lil-lab/phrase_grounding.

Via

Access Paper or Ask Questions

IncDSI: Incrementally Updatable Document Retrieval

Jul 19, 2023

Varsha Kishore, Chao Wan, Justin Lovelace, Yoav Artzi, Kilian Q. Weinberger

Figure 1 for IncDSI: Incrementally Updatable Document Retrieval

Figure 2 for IncDSI: Incrementally Updatable Document Retrieval

Figure 3 for IncDSI: Incrementally Updatable Document Retrieval

Figure 4 for IncDSI: Incrementally Updatable Document Retrieval

Abstract:Differentiable Search Index is a recently proposed paradigm for document retrieval, that encodes information about a corpus of documents within the parameters of a neural network and directly maps queries to corresponding documents. These models have achieved state-of-the-art performances for document retrieval across many benchmarks. These kinds of models have a significant limitation: it is not easy to add new documents after a model is trained. We propose IncDSI, a method to add documents in real time (about 20-50ms per document), without retraining the model on the entire dataset (or even parts thereof). Instead we formulate the addition of documents as a constrained optimization problem that makes minimal changes to the network parameters. Although orders of magnitude faster, our approach is competitive with re-training the model on the whole dataset and enables the development of document retrieval systems that can be updated with new information in real-time. Our code for IncDSI is available at https://github.com/varshakishore/IncDSI.

Via

Access Paper or Ask Questions

Continually Improving Extractive QA via Human Feedback

May 21, 2023

Ge Gao, Hung-Ting Chen, Yoav Artzi, Eunsol Choi

Figure 1 for Continually Improving Extractive QA via Human Feedback

Figure 2 for Continually Improving Extractive QA via Human Feedback

Figure 3 for Continually Improving Extractive QA via Human Feedback

Figure 4 for Continually Improving Extractive QA via Human Feedback

Abstract:We study continually improving an extractive question answering (QA) system via human user feedback. We design and deploy an iterative approach, where information-seeking users ask questions, receive model-predicted answers, and provide feedback. We conduct experiments involving thousands of user interactions under diverse setups to broaden the understanding of learning from feedback over time. Our experiments show effective improvement from user feedback of extractive QA models over time across different data regimes, including significant potential for domain adaptation.

Via

Access Paper or Ask Questions

Semantic uncertainty guides the extension of conventions to new referents

May 11, 2023

Ron Eliav, Anya Ji, Yoav Artzi, Robert D. Hawkins

Figure 1 for Semantic uncertainty guides the extension of conventions to new referents

Figure 2 for Semantic uncertainty guides the extension of conventions to new referents

Figure 3 for Semantic uncertainty guides the extension of conventions to new referents

Figure 4 for Semantic uncertainty guides the extension of conventions to new referents

Abstract:A long tradition of studies in psycholinguistics has examined the formation and generalization of ad hoc conventions in reference games, showing how newly acquired conventions for a given target transfer to new referential contexts. However, another axis of generalization remains understudied: how do conventions formed for one target transfer to completely distinct targets, when specific lexical choices are unlikely to repeat? This paper presents two dyadic studies (N = 240) that address this axis of generalization, focusing on the role of nameability -- the a priori likelihood that two individuals will share the same label. We leverage the recently-released KiloGram dataset, a collection of abstract tangram images that is orders of magnitude larger than previously available, exhibiting high diversity of properties like nameability. Our first study asks how nameability shapes convention formation, while the second asks how new conventions generalize to entirely new targets of reference. Our results raise new questions about how ad hoc conventions extend beyond target-specific re-use of specific lexical choices.

* Proceedings of the 45th Annual Conference of the Cognitive Science Society

Via

Access Paper or Ask Questions

CB2: Collaborative Natural Language Interaction Research Platform

Mar 14, 2023

Jacob Sharf, Mustafa Omer Gul, Yoav Artzi

Figure 1 for CB2: Collaborative Natural Language Interaction Research Platform

Figure 2 for CB2: Collaborative Natural Language Interaction Research Platform

Figure 3 for CB2: Collaborative Natural Language Interaction Research Platform

Figure 4 for CB2: Collaborative Natural Language Interaction Research Platform

Abstract:CB2 is a multi-agent platform to study collaborative natural language interaction in a grounded task-oriented scenario. It includes a 3D game environment, a backend server designed to serve trained models to human agents, and various tools and processes to enable scalable studies. We deploy CB2 at https://cb2.ai as a system demonstration with a learned instruction following model.

Via

Access Paper or Ask Questions

Continual Learning for Instruction Following from Realtime Feedback

Dec 19, 2022

Alane Suhr, Yoav Artzi

Abstract:We study the problem of continually training an instruction-following agent through feedback provided by users during collaborative interactions. During interaction, human users instruct an agent using natural language, and provide realtime binary feedback as they observe the agent's instruction execution. We cast learning as a contextual bandit problem, converting the user feedback to immediate reward. We evaluate through multiple rounds of human-agent interactions, demonstrating 15.4% absolute improvement in instruction execution over time. We also show our approach is robust to several design variations, and that the feedback signal is roughly equivalent to the learning signal of supervised demonstration data.

Via

Access Paper or Ask Questions

Abstract Visual Reasoning with Tangram Shapes

Nov 29, 2022

Anya Ji, Noriyuki Kojima, Noah Rush, Alane Suhr, Wai Keen Vong, Robert D. Hawkins, Yoav Artzi

Figure 1 for Abstract Visual Reasoning with Tangram Shapes

Figure 2 for Abstract Visual Reasoning with Tangram Shapes

Figure 3 for Abstract Visual Reasoning with Tangram Shapes

Figure 4 for Abstract Visual Reasoning with Tangram Shapes

Abstract:We introduce KiloGram, a resource for studying abstract visual reasoning in humans and machines. Drawing on the history of tangram puzzles as stimuli in cognitive science, we build a richly annotated dataset that, with >1k distinct stimuli, is orders of magnitude larger and more diverse than prior resources. It is both visually and linguistically richer, moving beyond whole shape descriptions to include segmentation maps and part labels. We use this resource to evaluate the abstract visual reasoning capacities of recent multi-modal models. We observe that pre-trained weights demonstrate limited abstract reasoning, which dramatically improves with fine-tuning. We also observe that explicitly describing parts aids abstract reasoning for both humans and models, especially when jointly encoding the linguistic and visual inputs. KiloGram is available at https://lil.nlp.cornell.edu/kilogram .

* EMNLP 2022 long paper

Via

Access Paper or Ask Questions

lilGym: Natural Language Visual Reasoning with Reinforcement Learning

Nov 03, 2022

Anne Wu, Kianté Brantley, Noriyuki Kojima, Yoav Artzi

Abstract:We present lilGym, a new benchmark for language-conditioned reinforcement learning in visual environments. lilGym is based on 2,661 highly-compositional human-written natural language statements grounded in an interactive visual environment. We annotate all statements with executable Python programs representing their meaning to enable exact reward computation in every possible world state. Each statement is paired with multiple start states and reward functions to form thousands of distinct Markov Decision Processes of varying difficulty. We experiment with lilGym with different models and learning regimes. Our results and analysis show that while existing methods are able to achieve non-trivial performance, lilGym forms a challenging open problem. lilGym is available at https://lil.nlp.cornell.edu/lilgym/.

Via

Access Paper or Ask Questions