Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Auxiliary Task Update Decomposition: The Good, The Bad and The Neutral

Aug 25, 2021
Lucio M. Dery, Yann Dauphin, David Grangier

While deep learning has been very beneficial in data-rich settings, tasks with smaller training set often resort to pre-training or multitask learning to leverage data from other tasks. In this case, careful consideration is needed to select tasks and model parameterizations such that updates from the auxiliary tasks actually help the primary task. We seek to alleviate this burden by formulating a model-agnostic framework that performs fine-grained manipulation of the auxiliary task gradients. We propose to decompose auxiliary updates into directions which help, damage or leave the primary task loss unchanged. This allows weighting the update directions differently depending on their impact on the problem of interest. We present a novel and efficient algorithm for that purpose and show its advantage in practice. Our method leverages efficient automatic differentiation procedures and randomized singular value decomposition for scalability. We show that our framework is generic and encompasses some prior work as particular cases. Our approach consistently outperforms strong and widely used baselines when leveraging out-of-distribution data for Text and Image classification tasks.

* 15 pages, 3 figures, Accepted to International Conference on Learning Representations (ICLR) 2021 See}{ for associated code 

  Access Paper or Ask Questions

Emotion Stimulus Detection in German News Headlines

Jul 28, 2021
Bao Minh Doan Dang, Laura Oberländer, Roman Klinger

Emotion stimulus extraction is a fine-grained subtask of emotion analysis that focuses on identifying the description of the cause behind an emotion expression from a text passage (e.g., in the sentence "I am happy that I passed my exam" the phrase "passed my exam" corresponds to the stimulus.). Previous work mainly focused on Mandarin and English, with no resources or models for German. We fill this research gap by developing a corpus of 2006 German news headlines annotated with emotions and 811 instances with annotations of stimulus phrases. Given that such corpus creation efforts are time-consuming and expensive, we additionally work on an approach for projecting the existing English GoodNewsEveryone (GNE) corpus to a machine-translated German version. We compare the performance of a conditional random field (CRF) model (trained monolingually on German and cross-lingually via projection) with a multilingual XLM-RoBERTa (XLM-R) model. Our results show that training with the German corpus achieves higher F1 scores than projection. Experiments with XLM-R outperform their respective CRF counterparts.

* accepted at KONVENS 2021 

  Access Paper or Ask Questions

Statistical Estimation from Dependent Data

Jul 20, 2021
Yuval Dagan, Constantinos Daskalakis, Nishanth Dikkala, Surbhi Goel, Anthimos Vardis Kandiros

We consider a general statistical estimation problem wherein binary labels across different observations are not independent conditioned on their feature vectors, but dependent, capturing settings where e.g. these observations are collected on a spatial domain, a temporal domain, or a social network, which induce dependencies. We model these dependencies in the language of Markov Random Fields and, importantly, allow these dependencies to be substantial, i.e do not assume that the Markov Random Field capturing these dependencies is in high temperature. As our main contribution we provide algorithms and statistically efficient estimation rates for this model, giving several instantiations of our bounds in logistic regression, sparse logistic regression, and neural network settings with dependent data. Our estimation guarantees follow from novel results for estimating the parameters (i.e. external fields and interaction strengths) of Ising models from a {\em single} sample. {We evaluate our estimation approach on real networked data, showing that it outperforms standard regression approaches that ignore dependencies, across three text classification datasets: Cora, Citeseer and Pubmed.}

* 41 pages, ICML 2021 

  Access Paper or Ask Questions

Capturing Event Argument Interaction via A Bi-Directional Entity-Level Recurrent Decoder

Jul 01, 2021
Xiangyu Xi, Wei Ye, Shikun Zhang, Quanxiu Wang, Huixing Jiang, Wei Wu

Capturing interactions among event arguments is an essential step towards robust event argument extraction (EAE). However, existing efforts in this direction suffer from two limitations: 1) The argument role type information of contextual entities is mainly utilized as training signals, ignoring the potential merits of directly adopting it as semantically rich input features; 2) The argument-level sequential semantics, which implies the overall distribution pattern of argument roles over an event mention, is not well characterized. To tackle the above two bottlenecks, we formalize EAE as a Seq2Seq-like learning problem for the first time, where a sentence with a specific event trigger is mapped to a sequence of event argument roles. A neural architecture with a novel Bi-directional Entity-level Recurrent Decoder (BERD) is proposed to generate argument roles by incorporating contextual entities' argument role predictions, like a word-by-word text generation process, thereby distinguishing implicit argument distribution patterns within an event more accurately.

* ACL-IJCNLP 2021 

  Access Paper or Ask Questions

A Template-guided Hybrid Pointer Network for Knowledge-basedTask-oriented Dialogue Systems

Jun 10, 2021
Dingmin Wang, Ziyao Chen, Wanwei He, Li Zhong, Yunzhe Tao, Min Yang

Most existing neural network based task-oriented dialogue systems follow encoder-decoder paradigm, where the decoder purely depends on the source texts to generate a sequence of words, usually suffering from instability and poor readability. Inspired by the traditional template-based generation approaches, we propose a template-guided hybrid pointer network for the knowledge-based task-oriented dialogue system, which retrieves several potentially relevant answers from a pre-constructed domain-specific conversational repository as guidance answers, and incorporates the guidance answers into both the encoding and decoding processes. Specifically, we design a memory pointer network model with a gating mechanism to fully exploit the semantic correlation between the retrieved answers and the ground-truth response. We evaluate our model on four widely used task-oriented datasets, including one simulated and three manually created datasets. The experimental results demonstrate that the proposed model achieves significantly better performance than the state-of-the-art methods over different automatic evaluation metrics.

* DialDoc [email protected] 

  Access Paper or Ask Questions

Recent Advances and Trends in Multimodal Deep Learning: A Review

May 24, 2021
Jabeen Summaira, Xi Li, Amin Muhammad Shoib, Songyuan Li, Jabbar Abdul

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning is to create models that can process and link information using various modalities. Despite the extensive development made for unimodal learning, it still cannot cover all the aspects of human learning. Multimodal learning helps to understand and analyze better when various senses are engaged in the processing of information. This paper focuses on multiple types of modalities, i.e., image, video, text, audio, body gestures, facial expressions, and physiological signals. Detailed analysis of past and current baseline approaches and an in-depth study of recent advancements in multimodal deep learning applications has been provided. A fine-grained taxonomy of various multimodal deep learning applications is proposed, elaborating on different applications in more depth. Architectures and datasets used in these applications are also discussed, along with their evaluation metrics. Last, main issues are highlighted separately for each domain along with their possible future research directions.

  Access Paper or Ask Questions

Minimal Supervision for Morphological Inflection

Apr 17, 2021
Omer Goldman, Reut Tsarfaty

Neural models for the various flavours of morphological inflection tasks have proven to be extremely accurate given ample labeled data -- data that may be slow and costly to obtain. In this work we aim to overcome this annotation bottleneck by bootstrapping labeled data from a seed as little as {\em five} labeled paradigms, accompanied by a large bulk of unlabeled text. Our approach exploits different kinds of regularities in morphological systems in a two-phased setup, where word tagging based on {\em analogies} is followed by word pairing based on {\em distances}. We experiment with the Paradigm Cell Filling Problem over eight typologically different languages, and find that, in languages with relatively simple morphology, orthographic regularities on their own allow inflection models to achieve respectable accuracy. Combined orthographic and semantic regularities alleviate difficulties with particularly complex morpho-phonological systems. Our results suggest that hand-crafting many tagged examples might be an unnecessary effort. However, more work is needed in order to address rarely used forms.

  Access Paper or Ask Questions

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Mar 24, 2021
Amaia Salvador, Erhan Gundogdu, Loris Bazzani, Michael Donoser

Cross-modal recipe retrieval has recently gained substantial attention due to the importance of food in people's lives, as well as the availability of vast amounts of digital cooking recipes and food images to train machine learning models. In this work, we revisit existing approaches for cross-modal recipe retrieval and propose a simplified end-to-end model based on well established and high performing encoders for text and images. We introduce a hierarchical recipe Transformer which attentively encodes individual recipe components (titles, ingredients and instructions). Further, we propose a self-supervised loss function computed on top of pairs of individual recipe components, which is able to leverage semantic relationships within recipes, and enables training using both image-recipe and recipe-only samples. We conduct a thorough analysis and ablation studies to validate our design choices. As a result, our proposed method achieves state-of-the-art performance in the cross-modal recipe retrieval task on the Recipe1M dataset. We make code and models publicly available.

* CVPR 2021 

  Access Paper or Ask Questions

Generating Synthetic Handwritten Historical Documents With OCR Constrained GANs

Mar 15, 2021
Lars Vögtlin, Manuel Drazyk, Vinaychandran Pondenkandath, Michele Alberti, Rolf Ingold

We present a framework to generate synthetic historical documents with precise ground truth using nothing more than a collection of unlabeled historical images. Obtaining large labeled datasets is often the limiting factor to effectively use supervised deep learning methods for Document Image Analysis (DIA). Prior approaches towards synthetic data generation either require expertise or result in poor accuracy in the synthetic documents. To achieve high precision transformations without requiring expertise, we tackle the problem in two steps. First, we create template documents with user-specified content and structure. Second, we transfer the style of a collection of unlabeled historical images to these template documents while preserving their text and layout. We evaluate the use of our synthetic historical documents in a pre-training setting and find that we outperform the baselines (randomly initialized and pre-trained). Additionally, with visual examples, we demonstrate a high-quality synthesis that makes it possible to generate large labeled historical document datasets with precise ground truth.

  Access Paper or Ask Questions

Machine Translation Customization via Automatic Training Data Selection from the Web

Feb 20, 2021
Thuy Vu, Alessandro Moschitti

Machine translation (MT) systems, especially when designed for an industrial setting, are trained with general parallel data derived from the Web. Thus, their style is typically driven by word/structure distribution coming from the average of many domains. In contrast, MT customers want translations to be specialized to their domain, for which they are typically able to provide text samples. We describe an approach for customizing MT systems on specific domains by selecting data similar to the target customer data to train neural translation models. We build document classifiers using monolingual target data, e.g., provided by the customers to select parallel training data from Web crawled data. Finally, we train MT models on our automatically selected data, obtaining a system specialized to the target domain. We tested our approach on the benchmark from WMT-18 Translation Task for News domains enabling comparisons with state-of-the-art MT systems. The results show that our models outperform the top systems while using less data and smaller models.

* ECIR 2021 

  Access Paper or Ask Questions