Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marie-Francine Moens

Sequence-to-Sequence Spanish Pre-trained Language Models

Sep 20, 2023

Vladimir Araujo, Maria Mihaela Trusca, Rodrigo Tufiño, Marie-Francine Moens

Figure 1 for Sequence-to-Sequence Spanish Pre-trained Language Models

Figure 2 for Sequence-to-Sequence Spanish Pre-trained Language Models

Figure 3 for Sequence-to-Sequence Spanish Pre-trained Language Models

Figure 4 for Sequence-to-Sequence Spanish Pre-trained Language Models

Abstract:In recent years, substantial advancements in pre-trained language models have paved the way for the development of numerous non-English language versions, with a particular focus on encoder-only and decoder-only architectures. While Spanish language models encompassing BERT, RoBERTa, and GPT have exhibited prowess in natural language understanding and generation, there remains a scarcity of encoder-decoder models designed for sequence-to-sequence tasks involving input-output pairs. This paper breaks new ground by introducing the implementation and evaluation of renowned encoder-decoder architectures, exclusively pre-trained on Spanish corpora. Specifically, we present Spanish versions of BART, T5, and BERT2BERT-style models and subject them to a comprehensive assessment across a diverse range of sequence-to-sequence tasks, spanning summarization, rephrasing, and generative question answering. Our findings underscore the competitive performance of all models, with BART and T5 emerging as top performers across all evaluated tasks. As an additional contribution, we have made all models publicly available to the research community, fostering future exploration and development in Spanish language processing.

Via

Access Paper or Ask Questions

When Do Discourse Markers Affect Computational Sentence Understanding?

Sep 01, 2023

Ruiqi Li, Liesbeth Allein, Damien Sileo, Marie-Francine Moens

Figure 1 for When Do Discourse Markers Affect Computational Sentence Understanding?

Figure 2 for When Do Discourse Markers Affect Computational Sentence Understanding?

Figure 3 for When Do Discourse Markers Affect Computational Sentence Understanding?

Figure 4 for When Do Discourse Markers Affect Computational Sentence Understanding?

Abstract:The capabilities and use cases of automatic natural language processing (NLP) have grown significantly over the last few years. While much work has been devoted to understanding how humans deal with discourse connectives, this phenomenon is understudied in computational systems. Therefore, it is important to put NLP models under the microscope and examine whether they can adequately comprehend, process, and reason within the complexity of natural language. In this chapter, we introduce the main mechanisms behind automatic sentence processing systems step by step and then focus on evaluating discourse connective processing. We assess nine popular systems in their ability to understand English discourse connectives and analyze how context and language understanding tasks affect their connective comprehension. The results show that NLP systems do not process all discourse connectives equally well and that the computational processing complexity of different connective kinds is not always consistently in line with the presumed complexity order found in human processing. In addition, while humans are more inclined to be influenced during the reading procedure but not necessarily in the final comprehension performance, discourse connectives have a significant impact on the final accuracy of NLP systems. The richer knowledge of connectives a system learns, the more negative effect inappropriate connectives have on it. This suggests that the correct explicitation of discourse connectives is important for computational natural language processing.

* Trends in Linguistics. Studies and Monographs, 2022
* Chapter 7 of Discourse Markers in Interaction, published in Trends in Linguistics. Studies and Monographs

Via

Access Paper or Ask Questions

Beyond Document Page Classification: Design, Datasets, and Challenges

Aug 29, 2023

Jordy Van Landeghem, Sanket Biswas, Matthew B. Blaschko, Marie-Francine Moens

Abstract:This paper highlights the need to bring document classification benchmarking closer to real-world applications, both in the nature of data tested ($X$: multi-channel, multi-paged, multi-industry; $Y$: class distributions and label set variety) and in classification tasks considered ($f$: multi-page document, page stream, and document bundle classification, ...). We identify the lack of public multi-page document classification datasets, formalize different classification tasks arising in application scenarios, and motivate the value of targeting efficient multi-page document representations. An experimental study on proposed multi-page document classification datasets demonstrates that current benchmarks have become irrelevant and need to be updated to evaluate complete documents, as they naturally occur in practice. This reality check also calls for more mature evaluation methodologies, covering calibration evaluation, inference complexity (time-memory), and a range of realistic distribution shifts (e.g., born-digital vs. scanning noise, shifting page order). Our study ends on a hopeful note by recommending concrete avenues for future improvements.}

* 8 pages, under review

Via

Access Paper or Ask Questions

GADePo: Graph-Assisted Declarative Pooling Transformers for Document-Level Relation Extraction

Aug 28, 2023

Andrei C. Coman, Christos Theodoropoulos, Marie-Francine Moens, James Henderson

Abstract:Document-level relation extraction aims to identify relationships between entities within a document. Current methods rely on text-based encoders and employ various hand-coded pooling heuristics to aggregate information from entity mentions and associated contexts. In this paper, we replace these rigid pooling functions with explicit graph relations by leveraging the intrinsic graph processing capabilities of the Transformer model. We propose a joint text-graph Transformer model, and a graph-assisted declarative pooling (GADePo) specification of the input which provides explicit and high-level instructions for information aggregation. This allows the pooling process to be guided by domain-specific knowledge or desired outcomes but still learned by the Transformer, leading to more flexible and customizable pooling strategies. We extensively evaluate our method across diverse datasets and models, and show that our approach yields promising results that are comparable to those achieved by the hand-coded pooling functions.

Via

Access Paper or Ask Questions

Visually-Aware Context Modeling for News Image Captioning

Aug 16, 2023

Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens

Abstract:The goal of News Image Captioning is to generate an image caption according to the content of both a news article and an image. To leverage the visual information effectively, it is important to exploit the connection between the context in the articles/captions and the images. Psychological studies indicate that human faces in images draw higher attention priorities. On top of that, humans often play a central role in news stories, as also proven by the face-name co-occurrence pattern we discover in existing News Image Captioning datasets. Therefore, we design a face-naming module for faces in images and names in captions/articles to learn a better name embedding. Apart from names, which can be directly linked to an image area (faces), news image captions mostly contain context information that can only be found in the article. Humans typically address this by searching for relevant information from the article based on the image. To emulate this thought process, we design a retrieval strategy using CLIP to retrieve sentences that are semantically close to the image. We conduct extensive experiments to demonstrate the efficacy of our framework. Without using additional paired data, we establish the new state-of-the-art performance on two News Image Captioning datasets, exceeding the previous state-of-the-art by 5 CIDEr points. We will release code upon acceptance.

Via

Access Paper or Ask Questions

Multimodal Distillation for Egocentric Action Recognition

Jul 18, 2023

Gorjan Radevski, Dusan Grujicic, Marie-Francine Moens, Matthew Blaschko, Tinne Tuytelaars

Figure 1 for Multimodal Distillation for Egocentric Action Recognition

Figure 2 for Multimodal Distillation for Egocentric Action Recognition

Figure 3 for Multimodal Distillation for Egocentric Action Recognition

Figure 4 for Multimodal Distillation for Egocentric Action Recognition

Abstract:The focal point of egocentric video understanding is modelling hand-object interactions. Standard models, e.g. CNNs or Vision Transformers, which receive RGB frames as input perform well. However, their performance improves further by employing additional input modalities that provide complementary cues, such as object detections, optical flow, audio, etc. The added complexity of the modality-specific modules, on the other hand, makes these models impractical for deployment. The goal of this work is to retain the performance of such a multimodal approach, while using only the RGB frames as input at inference time. We demonstrate that for egocentric action recognition on the Epic-Kitchens and the Something-Something datasets, students which are taught by multimodal teachers tend to be more accurate and better calibrated than architecturally equivalent models trained on ground truth labels in a unimodal or multimodal fashion. We further adopt a principled multimodal knowledge distillation framework, allowing us to deal with issues which occur when applying multimodal knowledge distillation in a naive manner. Lastly, we demonstrate the achieved reduction in computational complexity, and show that our approach maintains higher performance with the reduction of the number of input views. We release our code at https://github.com/gorjanradevski/multimodal-distillation.

* Accepted at ICCV 2023; Codebase released at https://github.com/gorjanradevski/multimodal-distillation

Via

Access Paper or Ask Questions

Alleviating Exposure Bias in Diffusion Models through Sampling with Shifted Time Steps

May 26, 2023

Mingxiao Li, Tingyu Qu, Wei Sun, Marie-Francine Moens

Figure 1 for Alleviating Exposure Bias in Diffusion Models through Sampling with Shifted Time Steps

Figure 2 for Alleviating Exposure Bias in Diffusion Models through Sampling with Shifted Time Steps

Figure 3 for Alleviating Exposure Bias in Diffusion Models through Sampling with Shifted Time Steps

Figure 4 for Alleviating Exposure Bias in Diffusion Models through Sampling with Shifted Time Steps

Abstract:Denoising Diffusion Probabilistic Models (DDPM) have shown remarkable efficacy in the synthesis of high-quality images. However, their inference process characteristically requires numerous, potentially hundreds, of iterative steps, which could lead to the problem of exposure bias due to the accumulation of prediction errors over iterations. Previous work has attempted to mitigate this issue by perturbing inputs during training, which consequently mandates the retraining of the DDPM. In this work, we conduct a systematic study of exposure bias in diffusion models and, intriguingly, we find that the exposure bias could be alleviated with a new sampling method, without retraining the model. We empirically and theoretically show that, during inference, for each backward time step $t$ and corresponding state $\hat{x}_t$, there might exist another time step $t_s$ which exhibits superior coupling with $\hat{x}_t$. Based on this finding, we introduce an inference method named Time-Shift Sampler. Our framework can be seamlessly integrated with existing sampling algorithms, such as DDIM or DDPM, inducing merely minimal additional computations. Experimental results show that our proposed framework can effectively enhance the quality of images generated by existing sampling algorithms.

* fix emails

Via

Access Paper or Ask Questions

Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities

May 26, 2023

Jingyuan Sun, Mingxiao Li, Zijiao Chen, Yunhao Zhang, Shaonan Wang, Marie-Francine Moens

Figure 1 for Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities

Figure 2 for Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities

Figure 3 for Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities

Figure 4 for Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities

Abstract:Decoding visual stimuli from neural responses recorded by functional Magnetic Resonance Imaging (fMRI) presents an intriguing intersection between cognitive neuroscience and machine learning, promising advancements in understanding human visual perception and building non-invasive brain-machine interfaces. However, the task is challenging due to the noisy nature of fMRI signals and the intricate pattern of brain visual representations. To mitigate these challenges, we introduce a two-phase fMRI representation learning framework. The first phase pre-trains an fMRI feature learner with a proposed Double-contrastive Mask Auto-encoder to learn denoised representations. The second phase tunes the feature learner to attend to neural activation patterns most informative for visual reconstruction with guidance from an image auto-encoder. The optimized fMRI feature learner then conditions a latent diffusion model to reconstruct image stimuli from brain activities. Experimental results demonstrate our model's superiority in generating high-resolution and semantically accurate images, substantially exceeding previous state-of-the-art methods by 39.34% in the 50-way-top-1 semantic classification accuracy. Our research invites further exploration of the decoding task's potential and contributes to the development of non-invasive brain-machine interfaces.

* 17 pages, 6 figures, conference

Via

Access Paper or Ask Questions

A Memory Model for Question Answering from Streaming Data Supported by Rehearsal and Anticipation of Coreference Information

May 12, 2023

Vladimir Araujo, Alvaro Soto, Marie-Francine Moens

Figure 1 for A Memory Model for Question Answering from Streaming Data Supported by Rehearsal and Anticipation of Coreference Information

Figure 2 for A Memory Model for Question Answering from Streaming Data Supported by Rehearsal and Anticipation of Coreference Information

Figure 3 for A Memory Model for Question Answering from Streaming Data Supported by Rehearsal and Anticipation of Coreference Information

Figure 4 for A Memory Model for Question Answering from Streaming Data Supported by Rehearsal and Anticipation of Coreference Information

Abstract:Existing question answering methods often assume that the input content (e.g., documents or videos) is always accessible to solve the task. Alternatively, memory networks were introduced to mimic the human process of incremental comprehension and compression of the information in a fixed-capacity memory. However, these models only learn how to maintain memory by backpropagating errors in the answers through the entire network. Instead, it has been suggested that humans have effective mechanisms to boost their memorization capacities, such as rehearsal and anticipation. Drawing inspiration from these, we propose a memory model that performs rehearsal and anticipation while processing inputs to memorize important information for solving question answering tasks from streaming data. The proposed mechanisms are applied self-supervised during training through masked modeling tasks focused on coreference information. We validate our model on a short-sequence (bAbI) dataset as well as large-sequence textual (NarrativeQA) and video (ActivityNet-QA) question answering datasets, where it achieves substantial improvements over previous memory network approaches. Furthermore, our ablation study confirms the proposed mechanisms' importance for memory models.

* Accepted paper at ACL2023 Findings

Via

Access Paper or Ask Questions

An Information Extraction Study: Take In Mind the Tokenization!

Apr 01, 2023

Christos Theodoropoulos, Marie-Francine Moens

Abstract:Current research on the advantages and trade-offs of using characters, instead of tokenized text, as input for deep learning models, has evolved substantially. New token-free models remove the traditional tokenization step; however, their efficiency remains unclear. Moreover, the effect of tokenization is relatively unexplored in sequence tagging tasks. To this end, we investigate the impact of tokenization when extracting information from documents and present a comparative study and analysis of subword-based and character-based models. Specifically, we study Information Extraction (IE) from biomedical texts. The main outcome is twofold: tokenization patterns can introduce inductive bias that results in state-of-the-art performance, and the character-based models produce promising results; thus, transitioning to token-free IE models is feasible.

* Submitted Manuscript/Preprint (accepted at EUSFLAT 2023, to be published in Lecture Notes in Computer Science (LNCS))

Via

Access Paper or Ask Questions