Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Sentiment": models, code, and papers

Corpus Statistics in Text Classification of Online Data

Mar 16, 2018
Marina Sokolova, Victoria Bobicev

Transformation of Machine Learning (ML) from a boutique science to a generally accepted technology has increased importance of reproduction and transportability of ML studies. In the current work, we investigate how corpus characteristics of textual data sets correspond to text classification results. We work with two data sets gathered from sub-forums of an online health-related forum. Our empirical results are obtained for a multi-class sentiment analysis application.

* 12 pages, 6 tables, 1 figure 

  Access Paper or Ask Questions

Automated Generation of Multilingual Clusters for the Evaluation of Distributed Representations

Apr 05, 2017
Philip Blair, Yuval Merhav, Joel Barry

We propose a language-agnostic way of automatically generating sets of semantically similar clusters of entities along with sets of "outlier" elements, which may then be used to perform an intrinsic evaluation of word embeddings in the outlier detection task. We used our methodology to create a gold-standard dataset, which we call WikiSem500, and evaluated multiple state-of-the-art embeddings. The results show a correlation between performance on this dataset and performance on sentiment analysis.

* Published as a workshop paper at ICLR 2017 

  Access Paper or Ask Questions

A Tidy Data Model for Natural Language Processing using cleanNLP

May 03, 2018
Taylor Arnold

The package cleanNLP provides a set of fast tools for converting a textual corpus into a set of normalized tables. The underlying natural language processing pipeline utilizes Stanford's CoreNLP library, exposing a number of annotation tasks for text written in English, French, German, and Spanish. Annotators include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and information extraction.

* The R Journal, 9.2, 248-267 (2017) 
* 20 pages; 4 figures 

  Access Paper or Ask Questions

Subject Specific Stream Classification Preprocessing Algorithm for Twitter Data Stream

May 28, 2017
Nisansa de Silva, Danaja Maldeniya, Chamilka Wijeratne

Micro-blogging service Twitter is a lucrative source for data mining applications on global sentiment. But due to the omnifariousness of the subjects mentioned in each data item; it is inefficient to run a data mining algorithm on the raw data. This paper discusses an algorithm to accurately classify the entire stream in to a given number of mutually exclusive collectively exhaustive streams upon each of which the data mining algorithm can be run separately yielding more relevant results with a high efficiency.

* 6 pages 

  Access Paper or Ask Questions

Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate Label Spaces

Apr 09, 2018
Isabelle Augenstein, Sebastian Ruder, Anders Søgaard

We combine multi-task learning and semi-supervised learning by inducing a joint embedding space between disparate label spaces and learning transfer functions between label embeddings, enabling us to jointly leverage unlabelled data and auxiliary, annotated datasets. We evaluate our approach on a variety of sequence classification tasks with disparate label spaces. We outperform strong single and multi-task baselines and achieve a new state-of-the-art for topic-based sentiment analysis.

* To appear at NAACL 2018 (long) 

  Access Paper or Ask Questions

Improved Neural Text Attribute Transfer with Non-parallel Data

Dec 04, 2017
Igor Melnyk, Cicero Nogueira dos Santos, Kahini Wadhawan, Inkit Padhi, Abhishek Kumar

Text attribute transfer using non-parallel data requires methods that can perform disentanglement of content and linguistic attributes. In this work, we propose multiple improvements over the existing approaches that enable the encoder-decoder framework to cope with the text attribute transfer from non-parallel data. We perform experiments on the sentiment transfer task using two datasets. For both datasets, our proposed method outperforms a strong baseline in two of the three employed evaluation metrics.

* NIPS 2017 Workshop on Learning Disentangled Representations: from Perception to Control 

  Access Paper or Ask Questions

Multichannel Variable-Size Convolution for Sentence Classification

Mar 15, 2016
Wenpeng Yin, Hinrich Schütze

We propose MVCNN, a convolution neural network (CNN) architecture for sentence classification. It (i) combines diverse versions of pretrained word embeddings and (ii) extracts features of multigranular phrases with variable-size convolution filters. We also show that pretraining MVCNN is critical for good performance. MVCNN achieves state-of-the-art performance on four tasks: on small-scale binary, small-scale multi-class and largescale Twitter sentiment prediction and on subjectivity classification.

* in Proceeding of CoNLL2015 

  Access Paper or Ask Questions

Inhibited Softmax for Uncertainty Estimation in Neural Networks

Oct 03, 2018
Marcin Możejko, Mateusz Susik, Rafał Karczewski

We present a new method for uncertainty estimation and out-of-distribution detection in neural networks with softmax output. We extend softmax layer with an additional constant input. The corresponding additional output is able to represent the uncertainty of the network. The proposed method requires neither additional parameters nor multiple forward passes nor input preprocessing nor out-of-distribution datasets. We show that our method performs comparably to more computationally expensive methods and outperforms baselines on our experiments from image recognition and sentiment analysis domains.

  Access Paper or Ask Questions

Modulated Fusion using Transformer for Linguistic-Acoustic Emotion Recognition

Oct 05, 2020
Jean-Benoit Delbrouck, Noé Tits, Stéphane Dupont

This paper aims to bring a new lightweight yet powerful solution for the task of Emotion Recognition and Sentiment Analysis. Our motivation is to propose two architectures based on Transformers and modulation that combine the linguistic and acoustic inputs from a wide range of datasets to challenge, and sometimes surpass, the state-of-the-art in the field. To demonstrate the efficiency of our models, we carefully evaluate their performances on the IEMOCAP, MOSI, MOSEI and MELD dataset. The experiments can be directly replicated and the code is fully open for future researches.

* EMNLP 2020 workshop: NLP Beyond Text (NLPBT) 

  Access Paper or Ask Questions

Dependency-based Convolutional Neural Networks for Sentence Embedding

Aug 03, 2015
Mingbo Ma, Liang Huang, Bing Xiang, Bowen Zhou

In sentence modeling and classification, convolutional neural network approaches have recently achieved state-of-the-art results, but all such efforts process word vectors sequentially and neglect long-distance dependencies. To exploit both deep learning and linguistic structures, we propose a tree-based convolutional neural network model which exploit various long-distance relationships between words. Our model improves the sequential baselines on all three sentiment and question classification tasks, and achieves the highest published accuracy on TREC.

* this paper has been accepted by ACL 2015 

  Access Paper or Ask Questions