Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Sentiment": models, code, and papers

In Search of Credible News

Nov 19, 2019
Momchil Hardalov, Ivan Koychev, Preslav Nakov

We study the problem of finding fake online news. This is an important problem as news of questionable credibility have recently been proliferating in social media at an alarming scale. As this is an understudied problem, especially for languages other than English, we first collect and release to the research community three new balanced credible vs. fake news datasets derived from four online sources. We then propose a language-independent approach for automatically distinguishing credible from fake news, based on a rich feature set. In particular, we use linguistic (n-gram), credibility-related (capitalization, punctuation, pronoun use, sentiment polarity), and semantic (embeddings and DBPedia data) features. Our experiments on three different testsets show that our model can distinguish credible from fake news with very high accuracy.

* AIMSA-2016 
* Credibility, veracity, fact checking, humor detection 

  Access Paper or Ask Questions

A Neural Approach to Irony Generation

Sep 16, 2019
Mengdi Zhu, Zhiwei Yu, Xiaojun Wan

Ironies can not only express stronger emotions but also show a sense of humor. With the development of social media, ironies are widely used in public. Although many prior research studies have been conducted in irony detection, few studies focus on irony generation. The main challenges for irony generation are the lack of large-scale irony dataset and difficulties in modeling the ironic pattern. In this work, we first systematically define irony generation based on style transfer task. To address the lack of data, we make use of twitter and build a large-scale dataset. We also design a combination of rewards for reinforcement learning to control the generation of ironic sentences. Experimental results demonstrate the effectiveness of our model in terms of irony accuracy, sentiment preservation, and content preservation.

  Access Paper or Ask Questions

Cooperative Learning of Disjoint Syntax and Semantics

Feb 25, 2019
Serhii Havrylov, Germán Kruszewski, Armand Joulin

There has been considerable attention devoted to models that learn to jointly infer an expression's syntactic structure and its semantics. Yet, \citet{NangiaB18} has recently shown that the current best systems fail to learn the correct parsing strategy on mathematical expressions generated from a simple context-free grammar. In this work, we present a recursive model inspired by \newcite{ChoiYL18} that reaches near perfect accuracy on this task. Our model is composed of two separated modules for syntax and semantics. They are cooperatively trained with standard continuous and discrete optimization schemes. Our model does not require any linguistic structure for supervision and its recursive nature allows for out-of-domain generalization with little loss in performance. Additionally, our approach performs competitively on several natural language tasks, such as Natural Language Inference or Sentiment Analysis.

* The paper was accepted at NAACL-HLT 2019 

  Access Paper or Ask Questions

Machine Learning Suites for Online Toxicity Detection

Oct 03, 2018
David Noever

To identify and classify toxic online commentary, the modern tools of data science transform raw text into key features from which either thresholding or learning algorithms can make predictions for monitoring offensive conversations. We systematically evaluate 62 classifiers representing 19 major algorithmic families against features extracted from the Jigsaw dataset of Wikipedia comments. We compare the classifiers based on statistically significant differences in accuracy and relative execution time. Among these classifiers for identifying toxic comments, tree-based algorithms provide the most transparently explainable rules and rank-order the predictive contribution of each feature. Among 28 features of syntax, sentiment, emotion and outlier word dictionaries, a simple bad word list proves most predictive of offensive commentary.

  Access Paper or Ask Questions

Reasoning with Sarcasm by Reading In-between

May 08, 2018
Yi Tay, Luu Anh Tuan, Siu Cheung Hui, Jian Su

Sarcasm is a sophisticated speech act which commonly manifests on social communities such as Twitter and Reddit. The prevalence of sarcasm on the social web is highly disruptive to opinion mining systems due to not only its tendency of polarity flipping but also usage of figurative language. Sarcasm commonly manifests with a contrastive theme either between positive-negative sentiments or between literal-figurative scenarios. In this paper, we revisit the notion of modeling contrast in order to reason with sarcasm. More specifically, we propose an attention-based neural model that looks in-between instead of across, enabling it to explicitly model contrast and incongruity. We conduct extensive experiments on six benchmark datasets from Twitter, Reddit and the Internet Argument Corpus. Our proposed model not only achieves state-of-the-art performance on all datasets but also enjoys improved interpretability.

* Accepted to ACL2018 

  Access Paper or Ask Questions

Learning text representation using recurrent convolutional neural network with highway layers

Aug 02, 2016
Ying Wen, Weinan Zhang, Rui Luo, Jun Wang

Recently, the rapid development of word embedding and neural networks has brought new inspiration to various NLP and IR tasks. In this paper, we describe a staged hybrid model combining Recurrent Convolutional Neural Networks (RCNN) with highway layers. The highway network module is incorporated in the middle takes the output of the bi-directional Recurrent Neural Network (Bi-RNN) module in the first stage and provides the Convolutional Neural Network (CNN) module in the last stage with the input. The experiment shows that our model outperforms common neural network models (CNN, RNN, Bi-RNN) on a sentiment analysis task. Besides, the analysis of how sequence length influences the RCNN with highway layers shows that our model could learn good representation for the long text.

* Neu-IR '16 SIGIR Workshop on Neural Information Retrieval 

  Access Paper or Ask Questions

Sentence-level Privacy for Document Embeddings

May 10, 2022
Casey Meehan, Khalil Mrini, Kamalika Chaudhuri

User language data can contain highly sensitive personal content. As such, it is imperative to offer users a strong and interpretable privacy guarantee when learning from their data. In this work, we propose SentDP: pure local differential privacy at the sentence level for a single user document. We propose a novel technique, DeepCandidate, that combines concepts from robust statistics and language modeling to produce high-dimensional, general-purpose $\epsilon$-SentDP document embeddings. This guarantees that any single sentence in a document can be substituted with any other sentence while keeping the embedding $\epsilon$-indistinguishable. Our experiments indicate that these private document embeddings are useful for downstream tasks like sentiment analysis and topic classification and even outperform baseline methods with weaker guarantees like word-level Metric DP.

* Presented at ACL 2022 main conference 

  Access Paper or Ask Questions

UDALM: Unsupervised Domain Adaptation through Language Modeling

Apr 14, 2021
Constantinos Karouzos, Georgios Paraskevopoulos, Alexandros Potamianos

In this work we explore Unsupervised Domain Adaptation (UDA) of pretrained language models for downstream tasks. We introduce UDALM, a fine-tuning procedure, using a mixed classification and Masked Language Model loss, that can adapt to the target domain distribution in a robust and sample efficient manner. Our experiments show that performance of models trained with the mixed loss scales with the amount of available target data and the mixed loss can be effectively used as a stopping criterion during UDA training. Furthermore, we discuss the relationship between A-distance and the target error and explore some limitations of the Domain Adversarial Training approach. Our method is evaluated on twelve domain pairs of the Amazon Reviews Sentiment dataset, yielding $91.74\%$ accuracy, which is an $1.11\%$ absolute improvement over the state-of-the-art.

* Accepted for publication in 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) 

  Access Paper or Ask Questions

How does Truth Evolve into Fake News? An Empirical Study of Fake News Evolution

Mar 10, 2021
Mingfei Guo, Xiuying Chen, Juntao Li, Dongyan Zhao, Rui Yan

Automatically identifying fake news from the Internet is a challenging problem in deception detection tasks. Online news is modified constantly during its propagation, e.g., malicious users distort the original truth and make up fake news. However, the continuous evolution process would generate unprecedented fake news and cheat the original model. We present the Fake News Evolution (FNE) dataset: a new dataset tracking the fake news evolution process. Our dataset is composed of 950 paired data, each of which consists of articles representing the three significant phases of the evolution process, which are the truth, the fake news, and the evolved fake news. We observe the features during the evolution and they are the disinformation techniques, text similarity, top 10 keywords, classification accuracy, parts of speech, and sentiment properties.

* The Web Conference 2021, Workshop on News Recommendation and Intelligence 
* 5 pages, 2 figures 

  Access Paper or Ask Questions

Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT

Feb 27, 2020
Lichao Sun, Kazuma Hashimoto, Wenpeng Yin, Akari Asai, Jia Li, Philip Yu, Caiming Xiong

There is an increasing amount of literature that claims the brittleness of deep neural networks in dealing with adversarial examples that are created maliciously. It is unclear, however, how the models will perform in realistic scenarios where \textit{natural rather than malicious} adversarial instances often exist. This work systematically explores the robustness of BERT, the state-of-the-art Transformer-style model in NLP, in dealing with noisy data, particularly mistakes in typing the keyboard, that occur inadvertently. Intensive experiments on sentiment analysis and question answering benchmarks indicate that: (i) Typos in various words of a sentence do not influence equally. The typos in informative words make severer damages; (ii) Mistype is the most damaging factor, compared with inserting, deleting, etc.; (iii) Humans and machines have different focuses on recognizing adversarial attacks.

  Access Paper or Ask Questions