Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Laure Soulier

Investigating the impact of 2D gesture representation on co-speech gesture generation

Jun 24, 2024

Teo Guichoux, Laure Soulier, Nicolas Obin, Catherine Pelachaud

Figure 1 for Investigating the impact of 2D gesture representation on co-speech gesture generation

Figure 2 for Investigating the impact of 2D gesture representation on co-speech gesture generation

Figure 3 for Investigating the impact of 2D gesture representation on co-speech gesture generation

Figure 4 for Investigating the impact of 2D gesture representation on co-speech gesture generation

Abstract:Co-speech gestures play a crucial role in the interactions between humans and embodied conversational agents (ECA). Recent deep learning methods enable the generation of realistic, natural co-speech gestures synchronized with speech, but such approaches require large amounts of training data. "In-the-wild" datasets, which compile videos from sources such as YouTube through human pose detection models, offer a solution by providing 2D skeleton sequences that are paired with speech. Concurrently, innovative lifting models have emerged, capable of transforming these 2D pose sequences into their 3D counterparts, leading to large and diverse datasets of 3D gestures. However, the derived 3D pose estimation is essentially a pseudo-ground truth, with the actual ground truth being the 2D motion data. This distinction raises questions about the impact of gesture representation dimensionality on the quality of generated motions, a topic that, to our knowledge, remains largely unexplored. In this work, we evaluate the impact of the dimensionality of the training data, 2D or 3D joint coordinates, on the performance of a multimodal speech-to-gesture deep generative model. We use a lifting model to convert 2D-generated sequences of body pose to 3D. Then, we compare the sequence of gestures generated directly in 3D to the gestures generated in 2D and lifted to 3D as post-processing.

* 8 pages. Paper accepted at WACAI 2024

Via

Access Paper or Ask Questions

What Makes Multimodal In-Context Learning Work?

Apr 25, 2024

Folco Bertini Baldassini, Mustafa Shukor, Matthieu Cord, Laure Soulier, Benjamin Piwowarski

Figure 1 for What Makes Multimodal In-Context Learning Work?

Figure 2 for What Makes Multimodal In-Context Learning Work?

Figure 3 for What Makes Multimodal In-Context Learning Work?

Figure 4 for What Makes Multimodal In-Context Learning Work?

Abstract:Large Language Models have demonstrated remarkable performance across various tasks, exhibiting the capacity to swiftly acquire new skills, such as through In-Context Learning (ICL) with minimal demonstration examples. In this work, we present a comprehensive framework for investigating Multimodal ICL (M-ICL) in the context of Large Multimodal Models. We consider the best open-source multimodal models (e.g., IDEFICS, OpenFlamingo) and a wide range of multimodal tasks. Our study unveils several noteworthy findings: (1) M-ICL primarily relies on text-driven mechanisms, showing little to no influence from the image modality. (2) When used with advanced-ICL strategy (like RICES), M-ICL is not better than a simple strategy based on majority voting over context examples. Moreover, we identify several biases and limitations of M-ICL that warrant consideration prior to deployment. Code available at https://gitlab.com/folbaeni/multimodal-icl

* 20 pages, 16 figures. Accepted to CVPR 2024 Workshop on Prompting in Vision. Project page: https://folbaeni.gitlab.io/multimodal-icl

Via

Access Paper or Ask Questions

PAQA: Toward ProActive Open-Retrieval Question Answering

Feb 26, 2024

Pierre Erbacher, Jian-Yun Nie, Philippe Preux, Laure Soulier

Figure 1 for PAQA: Toward ProActive Open-Retrieval Question Answering

Figure 2 for PAQA: Toward ProActive Open-Retrieval Question Answering

Figure 3 for PAQA: Toward ProActive Open-Retrieval Question Answering

Figure 4 for PAQA: Toward ProActive Open-Retrieval Question Answering

Abstract:Conversational systems have made significant progress in generating natural language responses. However, their potential as conversational search systems is currently limited due to their passive role in the information-seeking process. One major limitation is the scarcity of datasets that provide labelled ambiguous questions along with a supporting corpus of documents and relevant clarifying questions. This work aims to tackle the challenge of generating relevant clarifying questions by taking into account the inherent ambiguities present in both user queries and documents. To achieve this, we propose PAQA, an extension to the existing AmbiNQ dataset, incorporating clarifying questions. We then evaluate various models and assess how passage retrieval impacts ambiguity detection and the generation of clarifying questions. By addressing this gap in conversational search systems, we aim to provide additional supervision to enhance their active participation in the information-seeking process and provide users with more accurate results.

Via

Access Paper or Ask Questions

LOCOST: State-Space Models for Long Document Abstractive Summarization

Jan 31, 2024

Florian Le Bronnec, Song Duong, Mathieu Ravaut, Alexandre Allauzen, Nancy F. Chen, Vincent Guigue, Alberto Lumbreras, Laure Soulier, Patrick Gallinari

Figure 1 for LOCOST: State-Space Models for Long Document Abstractive Summarization

Figure 2 for LOCOST: State-Space Models for Long Document Abstractive Summarization

Figure 3 for LOCOST: State-Space Models for Long Document Abstractive Summarization

Figure 4 for LOCOST: State-Space Models for Long Document Abstractive Summarization

Abstract:State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of $O(L \log L)$, this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns. We evaluate our model on a series of long document abstractive summarization tasks. The model reaches a performance level that is 93-96% comparable to the top-performing sparse transformers of the same size while saving up to 50% memory during training and up to 87% during inference. Additionally, LOCOST effectively handles input texts exceeding 600K tokens at inference time, setting new state-of-the-art results on full-book summarization and opening new perspectives for long input processing.

* 9 pages, 5 figures, 7 tables, EACL 2024 conference

Via

Access Paper or Ask Questions

Simple Domain Adaptation for Sparse Retrievers

Jan 21, 2024

Mathias Vast, Yuxuan Zong, Basile Van Cooten, Benjamin Piwowarski, Laure Soulier

Abstract:In Information Retrieval, and more generally in Natural Language Processing, adapting models to specific domains is conducted through fine-tuning. Despite the successes achieved by this method and its versatility, the need for human-curated and labeled data makes it impractical to transfer to new tasks, domains, and/or languages when training data doesn't exist. Using the model without training (zero-shot) is another option that however suffers an effectiveness cost, especially in the case of first-stage retrievers. Numerous research directions have emerged to tackle these issues, most of them in the context of adapting to a task or a language. However, the literature is scarcer for domain (or topic) adaptation. In this paper, we address this issue of cross-topic discrepancy for a sparse first-stage retriever by transposing a method initially designed for language adaptation. By leveraging pre-training on the target data to learn domain-specific knowledge, this technique alleviates the need for annotated data and expands the scope of domain adaptation. Despite their relatively good generalization ability, we show that even sparse retrievers can benefit from our simple domain adaptation method.

* Accepted at ECIR 2024

Via

Access Paper or Ask Questions

Navigating Uncertainty: Optimizing API Dependency for Hallucination Reduction in Closed-Book Question Answering

Jan 03, 2024

Pierre Erbacher, Louis Falissar, Vincent Guigue, Laure Soulier

Figure 1 for Navigating Uncertainty: Optimizing API Dependency for Hallucination Reduction in Closed-Book Question Answering

Figure 2 for Navigating Uncertainty: Optimizing API Dependency for Hallucination Reduction in Closed-Book Question Answering

Figure 3 for Navigating Uncertainty: Optimizing API Dependency for Hallucination Reduction in Closed-Book Question Answering

Figure 4 for Navigating Uncertainty: Optimizing API Dependency for Hallucination Reduction in Closed-Book Question Answering

Abstract:While Large Language Models (LLM) are able to accumulate and restore knowledge, they are still prone to hallucination. Especially when faced with factual questions, LLM cannot only rely on knowledge stored in parameters to guarantee truthful and correct answers. Augmenting these models with the ability to search on external information sources, such as the web, is a promising approach to ground knowledge to retrieve information. However, searching in a large collection of documents introduces additional computational/time costs. An optimal behavior would be to query external resources only when the LLM is not confident about answers. In this paper, we propose a new LLM able to self-estimate if it is able to answer directly or needs to request an external tool. We investigate a supervised approach by introducing a hallucination masking mechanism in which labels are generated using a close book question-answering task. In addition, we propose to leverage parameter-efficient fine-tuning techniques to train our model on a small amount of data. Our model directly provides answers for $78.2\%$ of the known queries and opts to search for $77.2\%$ of the unknown ones. This results in the API being utilized only $62\%$ of the time.

Via

Access Paper or Ask Questions

Augmenting Ad-Hoc IR Dataset for Interactive Conversational Search

Nov 10, 2023

Pierre Erbacher, Jian-Yun Nie, Philippe Preux, Laure Soulier

Figure 1 for Augmenting Ad-Hoc IR Dataset for Interactive Conversational Search

Figure 2 for Augmenting Ad-Hoc IR Dataset for Interactive Conversational Search

Figure 3 for Augmenting Ad-Hoc IR Dataset for Interactive Conversational Search

Figure 4 for Augmenting Ad-Hoc IR Dataset for Interactive Conversational Search

Abstract:A peculiarity of conversational search systems is that they involve mixed-initiatives such as system-generated query clarifying questions. Evaluating those systems at a large scale on the end task of IR is very challenging, requiring adequate datasets containing such interactions. However, current datasets only focus on either traditional ad-hoc IR tasks or query clarification tasks, the latter being usually seen as a reformulation task from the initial query. The only two datasets known to us that contain both document relevance judgments and the associated clarification interactions are Qulac and ClariQ. Both are based on the TREC Web Track 2009-12 collection, but cover a very limited number of topics (237 topics), far from being enough for training and testing conversational IR models. To fill the gap, we propose a methodology to automatically build large-scale conversational IR datasets from ad-hoc IR datasets in order to facilitate explorations on conversational IR. Our methodology is based on two processes: 1) generating query clarification interactions through query clarification and answer generators, and 2) augmenting ad-hoc IR datasets with simulated interactions. In this paper, we focus on MsMarco and augment it with query clarification and answer simulations. We perform a thorough evaluation showing the quality and the relevance of the generated interactions for each initial query. This paper shows the feasibility and utility of augmenting ad-hoc IR datasets for conversational IR.

Via

Access Paper or Ask Questions

CIRCLE: Multi-Turn Query Clarifications with Reinforcement Learning

Nov 05, 2023

Pierre Erbacher, Laure Soulier

Figure 1 for CIRCLE: Multi-Turn Query Clarifications with Reinforcement Learning

Figure 2 for CIRCLE: Multi-Turn Query Clarifications with Reinforcement Learning

Figure 3 for CIRCLE: Multi-Turn Query Clarifications with Reinforcement Learning

Figure 4 for CIRCLE: Multi-Turn Query Clarifications with Reinforcement Learning

Abstract:Users often have trouble formulating their information needs into words on the first try when searching online. This can lead to frustration, as they may have to reformulate their queries when retrieved information is not relevant. This can be due to a lack of familiarity with the specific terminology related to their search topic, or because queries are ambiguous and related to multiple topics. Most modern search engines have interactive features that suggest clarifications or similar queries based on what others have searched for. However, the proposed models are either based on a single interaction or evaluated on search logs, hindering the naturalness of the interactions. In this paper, we introduce CIRCLE, a generative model for multi-turn query Clarifications wIth ReinforCement LEarning that leverages multi-turn interactions through a user simulation framework. Our model aims at generating a diverse set of query clarifications using a pretrained language model fine-tuned using reinforcement learning. We evaluate it against well established google suggestions using a user simulation framework.

Via

Access Paper or Ask Questions

Improving generalization in large language models by learning prefix subspaces

Oct 24, 2023

Louis Falissard, Vincent Guigue, Laure Soulier

Figure 1 for Improving generalization in large language models by learning prefix subspaces

Figure 2 for Improving generalization in large language models by learning prefix subspaces

Figure 3 for Improving generalization in large language models by learning prefix subspaces

Abstract:This article focuses on large language models (LLMs) fine-tuning in the scarce data regime (also known as the "few-shot" learning setting). We propose a method to increase the generalization capabilities of LLMs based on neural network subspaces. This optimization method, recently introduced in computer vision, aims to improve model generalization by identifying wider local optima through the joint optimization of an entire simplex of models in parameter space. Its adaptation to massive, pretrained transformers, however, poses some challenges. First, their considerable number of parameters makes it difficult to train several models jointly, and second, their deterministic parameter initialization schemes make them unfit for the subspace method as originally proposed. We show in this paper that "Parameter Efficient Fine-Tuning" (PEFT) methods, however, are perfectly compatible with this original approach, and propose to learn entire simplex of continuous prefixes. We test our method on a variant of the GLUE benchmark adapted to the few-shot learning setting, and show that both our contributions jointly lead to a gain in average performances compared to sota methods. The implementation can be found at the following link: https://github.com/Liloulou/prefix_subspace

Via

Access Paper or Ask Questions

Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Jun 07, 2023

Alexandre Rame, Guillaume Couairon, Mustafa Shukor, Corentin Dancette, Jean-Baptiste Gaya, Laure Soulier, Matthieu Cord

Figure 1 for Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Figure 2 for Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Figure 3 for Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Figure 4 for Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Abstract:Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further align the network with the intended usage. Yet the imperfections in the proxy reward may hinder the training and lead to suboptimal results; the diversity of objectives in real-world tasks and human opinions exacerbate the issue. This paper proposes embracing the heterogeneity of diverse rewards by following a multi-policy strategy. Rather than focusing on a single a priori reward, we aim for Pareto-optimal generalization across the entire space of preferences. To this end, we propose rewarded soup, first specializing multiple networks independently (one for each proxy reward) and then interpolating their weights linearly. This succeeds empirically because we show that the weights remain linearly connected when fine-tuned on diverse rewards from a shared pre-trained initialization. We demonstrate the effectiveness of our approach for text-to-text (summarization, Q&A, helpful assistant, review), text-image (image captioning, text-to-image generation, visual grounding, VQA), and control (locomotion) tasks. We hope to enhance the alignment of deep models, and how they interact with the world in all its diversity.

Via

Access Paper or Ask Questions