Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Meta-Embeddings for Natural Language Inference and Semantic Similarity tasks

Dec 01, 2020
Shree Charran R, Rahul Kumar Dubey

Word Representations form the core component for almost all advanced Natural Language Processing (NLP) applications such as text mining, question-answering, and text summarization, etc. Over the last two decades, immense research is conducted to come up with one single model to solve all major NLP tasks. The major problem currently is that there are a plethora of choices for different NLP tasks. Thus for NLP practitioners, the task of choosing the right model to be used itself becomes a challenge. Thus combining multiple pre-trained word embeddings and forming meta embeddings has become a viable approach to improve tackle NLP tasks. Meta embedding learning is a process of producing a single word embedding from a given set of pre-trained input word embeddings. In this paper, we propose to use Meta Embedding derived from few State-of-the-Art (SOTA) models to efficiently tackle mainstream NLP tasks like classification, semantic relatedness, and text similarity. We have compared both ensemble and dynamic variants to identify an efficient approach. The results obtained show that even the best State-of-the-Art models can be bettered. Thus showing us that meta-embeddings can be used for several NLP tasks by harnessing the power of several individual representations.

  Access Paper or Ask Questions

Incorporating End-to-End Speech Recognition Models for Sentiment Analysis

Feb 28, 2019
Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg, Stefan Wermter

Previous work on emotion recognition demonstrated a synergistic effect of combining several modalities such as auditory, visual, and transcribed text to estimate the affective state of a speaker. Among these, the linguistic modality is crucial for the evaluation of an expressed emotion. However, manually transcribed spoken text cannot be given as input to a system practically. We argue that using ground-truth transcriptions during training and evaluation phases leads to a significant discrepancy in performance compared to real-world conditions, as the spoken text has to be recognized on the fly and can contain speech recognition mistakes. In this paper, we propose a method of integrating an automatic speech recognition (ASR) output with a character-level recurrent neural network for sentiment recognition. In addition, we conduct several experiments investigating sentiment recognition for human-robot interaction in a noise-realistic scenario which is challenging for the ASR systems. We quantify the improvement compared to using only the acoustic modality in sentiment recognition. We demonstrate the effectiveness of this approach on the Multimodal Corpus of Sentiment Intensity (MOSI) by achieving 73,6% accuracy in a binary sentiment classification task, exceeding previously reported results that use only acoustic input. In addition, we set a new state-of-the-art performance on the MOSI dataset (80.4% accuracy, 2% absolute improvement).

* Accepted at the 2019 International Conference on Robotics and Automation (ICRA) will be held on May 20-24, 2019 in Montreal, Canada 

  Access Paper or Ask Questions

Leverage Score Sampling for Faster Accelerated Regression and ERM

Nov 22, 2017
Naman Agarwal, Sham Kakade, Rahul Kidambi, Yin Tat Lee, Praneeth Netrapalli, Aaron Sidford

Given a matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ and a vector $b \in\mathbb{R}^{d}$, we show how to compute an $\epsilon$-approximate solution to the regression problem $ \min_{x\in\mathbb{R}^{d}}\frac{1}{2} \|\mathbf{A} x - b\|_{2}^{2} $ in time $ \tilde{O} ((n+\sqrt{d\cdot\kappa_{\text{sum}}})\cdot s\cdot\log\epsilon^{-1}) $ where $\kappa_{\text{sum}}=\mathrm{tr}\left(\mathbf{A}^{\top}\mathbf{A}\right)/\lambda_{\min}(\mathbf{A}^{T}\mathbf{A})$ and $s$ is the maximum number of non-zero entries in a row of $\mathbf{A}$. Our algorithm improves upon the previous best running time of $ \tilde{O} ((n+\sqrt{n \cdot\kappa_{\text{sum}}})\cdot s\cdot\log\epsilon^{-1})$. We achieve our result through a careful combination of leverage score sampling techniques, proximal point methods, and accelerated coordinate descent. Our method not only matches the performance of previous methods, but further improves whenever leverage scores of rows are small (up to polylogarithmic factors). We also provide a non-linear generalization of these results that improves the running time for solving a broader class of ERM problems.

  Access Paper or Ask Questions

Prompt-Based Multi-Modal Image Segmentation

Dec 18, 2021
Timo Lüddecke, Alexander S. Ecker

Image segmentation is usually addressed by training a model for a fixed set of object classes. Incorporating additional classes or more complex queries later is expensive as it requires re-training the model on a dataset that encompasses these expressions. Here we propose a system that can generate image segmentations based on arbitrary prompts at test time. A prompt can be either a text or an image. This approach enables us to create a unified model (trained once) for three common segmentation tasks, which come with distinct challenges: referring expression segmentation, zero-shot segmentation and one-shot segmentation. We build upon the CLIP model as a backbone which we extend with a transformer-based decoder that enables dense prediction. After training on an extended version of the PhraseCut dataset, our system generates a binary segmentation map for an image based on a free-text prompt or on an additional image expressing the query. Different variants of the latter image-based prompts are analyzed in detail. This novel hybrid input allows for dynamic adaptation not only to the three segmentation tasks mentioned above, but to any binary segmentation task where a text or image query can be formulated. Finally, we find our system to adapt well to generalized queries involving affordances or properties. Source code:

  Access Paper or Ask Questions

DaRE: A Cross-Domain Recommender System with Domain-aware Feature Extraction and Review Encoder

Oct 25, 2021
Yoonhyuk Choi, Jiho Choi, Taewook Ko, Chongkwon Kim

Recent advent in recommender systems, especially text-aided methods and CDR (Cross-Domain Recommendation) leads to promising results in solving data-sparsity and cold-start problems. Despite such progress, prior algorithms either require user overlapping or ignore domain-aware feature extraction. In addition, text-aided methods exceedingly emphasize aggregated documents and fail to capture the specifics embedded in individual reviews. To overcome such limitations, we propose a novel method, named DaRE (Domainaware Feature Extraction and Review Encoder), a comprehensive solution that consists of three key components; text-based representation learning, domain-aware feature extraction, and a review encoder. DaRE debilitate noises by separating domain-invariant features from domain-specific features through selective adversarial training. DaRE extracts features from aggregated documents, and the review encoder fine-tunes the representations by aligning them with the features extracted from individual reviews. Experiments on four real-world datasets show the superiority of DaRE over state-ofthe-art single-domain and cross-domain methodologies, achieving 9.2 % and 3.6 % improvements, respectively. We upload our implementations ( for a reproducibility

  Access Paper or Ask Questions

Drug and Disease Interpretation Learning with Biomedical Entity Representation Transformer

Jan 22, 2021
Zulfat Miftahutdinov, Artur Kadurin, Roman Kudrin, Elena Tutubalina

Concept normalization in free-form texts is a crucial step in every text-mining pipeline. Neural architectures based on Bidirectional Encoder Representations from Transformers (BERT) have achieved state-of-the-art results in the biomedical domain. In the context of drug discovery and development, clinical trials are necessary to establish the efficacy and safety of drugs. We investigate the effectiveness of transferring concept normalization from the general biomedical domain to the clinical trials domain in a zero-shot setting with an absence of labeled data. We propose a simple and effective two-stage neural approach based on fine-tuned BERT architectures. In the first stage, we train a metric learning model that optimizes relative similarity of mentions and concepts via triplet loss. The model is trained on available labeled corpora of scientific abstracts to obtain vector embeddings of concept names and entity mentions from texts. In the second stage, we find the closest concept name representation in an embedding space to a given clinical mention. We evaluated several models, including state-of-the-art architectures, on a dataset of abstracts and a real-world dataset of trial records with interventions and conditions mapped to drug and disease terminologies. Extensive experiments validate the effectiveness of our approach in knowledge transfer from the scientific literature to clinical trials.

* Accepted to the 43rd European Conference on Information Retrieval (ECIR 2021) 

  Access Paper or Ask Questions

Attention improves concentration when learning node embeddings

Jun 11, 2020
Matthew Dippel, Adam Kiezun, Tanay Mehta, Ravi Sundaram, Srikanth Thirumalai, Akshar Varma

We consider the problem of predicting edges in a graph from node attributes in an e-commerce setting. Specifically, given nodes labelled with search query text, we want to predict links to related queries that share products. Experiments with a range of deep neural architectures show that simple feedforward networks with an attention mechanism perform best for learning embeddings. The simplicity of these models allows us to explain the performance of attention. We propose an analytically tractable model of query generation, AttEST, that views both products and the query text as vectors embedded in a latent space. We prove (and empirically validate) that the point-wise mutual information (PMI) matrix of the AttEST query text embeddings displays a low-rank behavior analogous to that observed in word embeddings. This low-rank property allows us to derive a loss function that maximizes the mutual information between related queries which is used to train an attention network to learn query embeddings. This AttEST network beats traditional memory-based LSTM architectures by over 20% on F-1 score. We justify this out-performance by showing that the weights from the attention mechanism correlate strongly with the weights of the best linear unbiased estimator (BLUE) for the product vectors, and conclude that attention plays an important role in variance reduction.

* 18 pages, 3 figures 

  Access Paper or Ask Questions

Creative Artificial Intelligence -- Algorithms vs. humans in an incentivized writing competition

May 20, 2020
Nils Köbis, Luca Mossink

The release of openly available, robust text generation algorithms has spurred much public attention and debate, due to algorithm's purported ability to generate human-like text across various domains. Yet, empirical evidence using incentivized tasks to assess human behavioral reactions to such algorithms is lacking. We conducted two experiments assessing behavioral reactions to the state-of-the-art Natural Language Generation algorithm GPT-2 (Ntotal = 830). Using the identical starting lines of human poems, GPT-2 produced samples of multiple algorithmically-generated poems. From these samples, either a random poem was chosen (Human-out-of-the-loop) or the best one was selected (Human-in-the-loop) and in turn matched with a human written poem. Taking part in a new incentivized version of the Turing Test, participants failed to reliably detect the algorithmically-generated poems in the human-in-the-loop treatment, yet succeeded in the Human-out-of-the-loop treatment. Further, the results reveal a general aversion towards algorithmic poetry, independent on whether participants were informed about the algorithmic origin of the poem (Transparency) or not (Opacity). We discuss what these results convey about the performance of NLG algorithms to produce human-like text and propose methodologies to study such learning algorithms in experimental settings.

  Access Paper or Ask Questions

Multimodal Categorization of Crisis Events in Social Media

Apr 10, 2020
Mahdi Abavisani, Liwei Wu, Shengli Hu, Joel Tetreault, Alejandro Jaimes

Recent developments in image classification and natural language processing, coupled with the rapid growth in social media usage, have enabled fundamental advances in detecting breaking events around the world in real-time. Emergency response is one such area that stands to gain from these advances. By processing billions of texts and images a minute, events can be automatically detected to enable emergency response workers to better assess rapidly evolving situations and deploy resources accordingly. To date, most event detection techniques in this area have focused on image-only or text-only approaches, limiting detection performance and impacting the quality of information delivered to crisis response teams. In this paper, we present a new multimodal fusion method that leverages both images and texts as input. In particular, we introduce a cross-attention module that can filter uninformative and misleading components from weak modalities on a sample by sample basis. In addition, we employ a multimodal graph-based approach to stochastically transition between embeddings of different multimodal pairs during training to better regularize the learning process as well as dealing with limited training data by constructing new matched pairs from different samples. We show that our method outperforms the unimodal approaches and strong multimodal baselines by a large margin on three crisis-related tasks.

* Conference on Computer Vision and Pattern Recognition (CVPR 2020) 
* Conference on Computer Vision and Pattern Recognition (CVPR 2020) 

  Access Paper or Ask Questions

Semantic Enrichment of Nigerian Pidgin English for Contextual Sentiment Classification

Mar 27, 2020
Wuraola Fisayo Oyewusi, Olubayo Adekanmbi, Olalekan Akinsande

Nigerian English adaptation, Pidgin, has evolved over the years through multi-language code switching, code mixing and linguistic adaptation. While Pidgin preserves many of the words in the normal English language corpus, both in spelling and pronunciation, the fundamental meaning of these words have changed significantly. For example,'ginger' is not a plant but an expression of motivation and 'tank' is not a container but an expression of gratitude. The implication is that the current approach of using direct English sentiment analysis of social media text from Nigeria is sub-optimal, as it will not be able to capture the semantic variation and contextual evolution in the contemporary meaning of these words. In practice, while many words in Nigerian Pidgin adaptation are the same as the standard English, the full English language based sentiment analysis models are not designed to capture the full intent of the Nigerian pidgin when used alone or code-mixed. By augmenting scarce human labelled code-changed text with ample synthetic code-reformatted text and meaning, we achieve significant improvements in sentiment scoring. Our research explores how to understand sentiment in an intrasentential code mixing and switching context where there has been significant word localization.This work presents a 300 VADER lexicon compatible Nigerian Pidgin sentiment tokens and their scores and a 14,000 gold standard Nigerian Pidgin tweets and their sentiments labels.

* Accepted to ICLR 2020 AfricaNLP workshop 

  Access Paper or Ask Questions