Observing the damages that can be done by the rapid propagation of fake news in various sectors like politics and finance, automatic identification of fake news using linguistic analysis has drawn the attention of the research community. However, such methods are largely being developed for English where low resource languages remain out of the focus. But the risks spawned by fake and manipulative news are not confined by languages. In this work, we propose an annotated dataset of ~50K news that can be used for building automated fake news detection systems for a low resource language like Bangla. Additionally, we provide an analysis of the dataset and develop a benchmark system with state of the art NLP techniques to identify Bangla fake news. To create this system, we explore traditional linguistic features and neural network based methods. We expect this dataset will be a valuable resource for building technologies to prevent the spreading of fake news and contribute in research with low resource languages.
In recent years, abusive behavior has become a serious issue in online social networks. In this paper, we present a new corpus from a semi-anonymous social media platform, which contains the instances of offensive and neutral classes. We introduce a single deep neural architecture that considers both local and sequential information from the text in order to detect abusive language. Along with this model, we introduce a new attention mechanism called emotion-aware attention. This mechanism utilizes the emotions behind the text to find the most important words within that text. We experiment with this model on our dataset and later present the analysis. Additionally, we evaluate our proposed method on different corpora and show new state-of-the-art results with respect to offensive language detection.
This paper considers the problem of characterizing stories by inferring attributes like theme and genre using the written narrative and user reviews. We experiment with a multi-label dataset of narratives representing the story of movies and a tagset representing various attributes of stories. To identify the story attributes, we propose a hierarchical representation of narratives that improves over the traditional feature-based machine learning methods as well as sequential representation approaches. Finally, we demonstrate a multi-view method for discovering story attributes from user opinions in reviews that are complementary to the gold standard data set.
The film culture has grown tremendously in recent years. The large number of streaming services put films as one of the most convenient forms of entertainment in today's world. Films can help us learn and inspire societal change. But they can also negatively affect viewers. In this paper, our goal is to predict the suitability of the movie content for children and young adults based on scripts. The criterion that we use to measure suitability is the MPAA rating that is specifically designed for this purpose. We propose an RNN based architecture with attention that jointly models the genre and the emotions in the script to predict the MPAA rating. We achieve 78% weighted F1-score for the classification model that outperforms the traditional machine learning method by 6%.
Folksonomy of movies covers a wide range of heterogeneous information about movies, like the genre, plot structure, visual experiences, soundtracks, metadata, and emotional experiences from watching a movie. Being able to automatically generate or predict tags for movies can help recommendation engines improve retrieval of similar movies, and help viewers know what to expect from a movie in advance. In this work, we explore the problem of creating tags for movies from plot synopses. We propose a novel neural network model that merges information from synopses and emotion flows throughout the plots to predict a set of tags for movies. We compare our system with multiple baselines and found that the addition of emotion flows boosts the performance of the network by learning ~18\% more tags than a traditional machine learning system.
This paper presents our system for "TRAC 2018 Shared Task on Aggression Identification". Our best systems for the English dataset use a combination of lexical and semantic features. However, for Hindi data using only lexical features gave us the best results. We obtained weighted F1- measures of 0.5921 for the English Facebook task (ranked 12th), 0.5663 for the English Social Media task (ranked 6th), 0.6292 for the Hindi Facebook task (ranked 1st), and 0.4853 for the Hindi Social Media task (ranked 2nd).
In this work we describe the system built for the three English subtasks of the SemEval 2016 Task 3 by the Department of Computer Science of the University of Houston (UH) and the Pattern Recognition and Human Language Technology (PRHLT) research center - Universitat Polit`ecnica de Val`encia: UH-PRHLT. Our system represents instances by using both lexical and semantic-based similarity measures between text pairs. Our semantic features include the use of distributed representations of words, knowledge graphs generated with the BabelNet multilingual semantic network, and the FrameNet lexical database. Experimental results outperform the random and Google search engine baselines in the three English subtasks. Our approach obtained the highest results of subtask B compared to the other task participants.
Books have the power to make us feel happiness, sadness, pain, surprise, or sorrow. An author's dexterity in the use of these emotions captivates readers and makes it difficult for them to put the book down. In this paper, we model the flow of emotions over a book using recurrent neural networks and quantify its usefulness in predicting success in books. We obtained the best weighted F1-score of 69% for predicting books' success in a multitask setting (simultaneously predicting success and genre of books).
Social tagging of movies reveals a wide range of heterogeneous information about movies, like the genre, plot structure, soundtracks, metadata, visual and emotional experiences. Such information can be valuable in building automatic systems to create tags for movies. Automatic tagging systems can help recommendation engines to improve the retrieval of similar movies as well as help viewers to know what to expect from a movie in advance. In this paper, we set out to the task of collecting a corpus of movie plot synopses and tags. We describe a methodology that enabled us to build a fine-grained set of around 70 tags exposing heterogeneous characteristics of movie plots and the multi-label associations of these tags with some 14K movie plot synopses. We investigate how these tags correlate with movies and the flow of emotions throughout different types of movies. Finally, we use this corpus to explore the feasibility of inferring tags from plot synopses. We expect the corpus will be useful in other tasks where analysis of narratives is relevant.