The Web has become the main platform where people express their opinions about entities of interest and their associated aspects. Aspect-Based Sentiment Analysis (ABSA) aims to automatically compute the sentiment towards these aspects from opinionated text. In this paper we extend the state-of-the-art Hybrid Approach for Aspect-Based Sentiment Analysis (HAABSA) method in two directions. First we replace the non-contextual word embeddings with deep contextual word embeddings in order to better cope with the word semantics in a given text. Second, we use hierarchical attention by adding an extra attention layer to the HAABSA high-level representations in order to increase the method flexibility in modeling the input data. Using two standard datasets (SemEval 2015 and SemEval 2016) we show that the proposed extensions improve the accuracy of the built model for ABSA.
While ubiquitous, textual sources of information such as company reports, social media posts, etc. are hardly included in prediction algorithms for time series, despite the relevant information they may contain. In this work, openly accessible daily weather reports from France and the United-Kingdom are leveraged to predict time series of national electricity consumption, average temperature and wind-speed with a single pipeline. Two methods of numerical representation of text are considered, namely traditional Term Frequency - Inverse Document Frequency (TF-IDF) as well as our own neural word embedding. Using exclusively text, we are able to predict the aforementioned time series with sufficient accuracy to be used to replace missing data. Furthermore the proposed word embeddings display geometric properties relating to the behavior of the time series and context similarity between words.
Pre-training Transformer from large-scale raw texts and fine-tuning on the desired task have achieved state-of-the-art results on diverse NLP tasks. However, it is unclear what the learned attention captures. The attention computed by attention heads seems not to match human intuitions about hierarchical structures. This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures. The tree structures can be automatically induced from raw texts by our proposed ``Constituent Attention'' module, which is simply implemented by self-attention between two adjacent words. With the same training procedure identical to BERT, the experiments demonstrate the effectiveness of Tree Transformer in terms of inducing tree structures, better language modeling, and further learning more explainable attention scores.
Statistical methods applied to social media posts shed light on the dynamics of online dialogue. For example, users' wording choices predict their persuasiveness and users adopt the language patterns of other dialogue participants. In this paper, we estimate the causal effect of reply tones in debates on linguistic and sentiment changes in subsequent responses. The challenge for this estimation is that a reply's tone and subsequent responses are confounded by the users' ideologies on the debate topic and their emotions. To overcome this challenge, we learn representations of ideology using generative models of text. We study debates from 4Forums and compare annotated tones of replying such as emotional versus factual, or reasonable versus attacking. We show that our latent confounder representation reduces bias in ATE estimation. Our results suggest that factual and asserting tones affect dialogue and provide a methodology for estimating causal effects from text.
Successful application processing sequential data, such as text and speech, requires an improved generalization performance of recurrent neural networks (RNNs). Dropout techniques for RNNs were introduced to respond to these demands, but we conjecture that the dropout on RNNs could have been improved by adopting the adversarial concept. This paper investigates ways to improve the dropout for RNNs by utilizing intentionally generated dropout masks. Specifically, the guided dropout used in this research is called as adversarial dropout, which adversarially disconnects neurons that are dominantly used to predict correct targets over time. Our analysis showed that our regularizer, which consists of a gap between the original and the reconfigured RNNs, was the upper bound of the gap between the training and the inference phases of the random dropout. We demonstrated that minimizing our regularizer improved the effectiveness of the dropout for RNNs on sequential MNIST tasks, semi-supervised text classification tasks, and language modeling tasks.
In the Humanities and Social Sciences, there is increasing interest in approaches to information extraction, prediction, intelligent linkage, and dimension reduction applicable to large text corpora. With approaches in these fields being grounded in traditional statistical techniques, the need arises for frameworks whereby advanced NLP techniques such as topic modelling may be incorporated within classical methodologies. This paper provides a classical, supervised, statistical learning framework for prediction from text, using topic models as a data reduction method and the topics themselves as predictors, alongside typical statistical tools for predictive modelling. We apply this framework in a Social Sciences context (applied animal behaviour) as well as a Humanities context (narrative analysis) as examples of this framework. The results show that topic regression models perform comparably to their much less efficient equivalents that use individual words as predictors.
We explore a human-driven approach to annotation, curated training (CT), in which annotation is framed as teaching the system by using interactive search to identify informative snippets of text to annotate, unlike traditional approaches which either annotate preselected text or use active learning. A trained annotator performed 80 hours of CT for the thirty event types of the NIST TAC KBP Event Argument Extraction evaluation. Combining this annotation with ACE results in a 6% reduction in error and the learning curve of CT plateaus more slowly than for full-document annotation. 3 NLP researchers performed CT for one event type and showed much sharper learning curves with all three exceeding ACE performance in less than ninety minutes, suggesting that CT can provide further benefits when the annotator deeply understands the system.
Effective representation of a text is critical for various natural language processing tasks. For the particular task of Chinese sentiment analysis, it is important to understand and choose an effective representation of a text from different forms of Chinese representations such as word, character and pinyin. This paper presents a systematic study of the effect of these representations for Chinese sentiment analysis by proposing a multi-channel convolutional neural network (MCCNN), where each channel corresponds to a representation. Experimental results show that: (1) Word wins on the dataset of low OOV rate while character wins otherwise; (2) Using these representations in combination generally improves the performance; (3) The representations based on MCCNN outperform conventional ngram features using SVM; (4) The proposed MCCNN model achieves the competitive performance against the state-of-the-art model fastText for Chinese sentiment analysis.
Conventional text classification models make a bag-of-words assumption reducing text into word occurrence counts per document. Recent algorithms such as word2vec are capable of learning semantic meaning and similarity between words in an entirely unsupervised manner using a contextual window and doing so much faster than previous methods. Each word is projected into vector space such that similar meaning words such as "strong" and "powerful" are projected into the same general Euclidean space. Open questions about these embeddings include their utility across classification tasks and the optimal properties and source of documents to construct broadly functional embeddings. In this work, we demonstrate the usefulness of pre-trained embeddings for classification in our task and demonstrate that custom word embeddings, built in the domain and for the tasks, can improve performance over word embeddings learnt on more general data including news articles or Wikipedia.
The WASSA 2017 EmoInt shared task has the goal to predict emotion intensity values of tweet messages. Given the text of a tweet and its emotion category (anger, joy, fear, and sadness), the participants were asked to build a system that assigns emotion intensity values. Emotion intensity estimation is a challenging problem given the short length of the tweets, the noisy structure of the text and the lack of annotated data. To solve this problem, we developed an ensemble of two neural models, processing input on the character. and word-level with a lexicon-driven system. The correlation scores across all four emotions are averaged to determine the bottom-line competition metric, and our system ranks place forth in full intensity range and third in 0.5-1 range of intensity among 23 systems at the time of writing (June 2017).