Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox


Building a Sentiment Corpus of Tweets in Brazilian Portuguese

Dec 24, 2017
Henrico Bertini Brum, Maria das Graças Volpe Nunes


Share this with someone who'll enjoy it:


The large amount of data available in social media, forums and websites motivates researches in several areas of Natural Language Processing, such as sentiment analysis. The popularity of the area due to its subjective and semantic characteristics motivates research on novel methods and approaches for classification. Hence, there is a high demand for datasets on different domains and different languages. This paper introduces TweetSentBR, a sentiment corpora for Brazilian Portuguese manually annotated with 15.000 sentences on TV show domain. The sentences were labeled in three classes (positive, neutral and negative) by seven annotators, following literature guidelines for ensuring reliability on the annotation. We also ran baseline experiments on polarity classification using three machine learning methods, reaching 80.99% on F-Measure and 82.06% on accuracy in binary classification, and 59.85% F-Measure and 64.62% on accuracy on three point classification.

* Accepted for publication in 11th International Conference on Language Resources and Evaluation (LREC 2018) 


   Access Paper Source



Share this with someone who'll enjoy it: