Alert button
Picture for Noura Farra

Noura Farra

Alert button

SemEval-2017 Task 4: Sentiment Analysis in Twitter

Dec 02, 2019
Sara Rosenthal, Noura Farra, Preslav Nakov

Figure 1 for SemEval-2017 Task 4: Sentiment Analysis in Twitter
Figure 2 for SemEval-2017 Task 4: Sentiment Analysis in Twitter
Figure 3 for SemEval-2017 Task 4: Sentiment Analysis in Twitter
Figure 4 for SemEval-2017 Task 4: Sentiment Analysis in Twitter

This paper describes the fifth year of the Sentiment Analysis in Twitter task. SemEval-2017 Task 4 continues with a rerun of the subtasks of SemEval-2016 Task 4, which include identifying the overall sentiment of the tweet, sentiment towards a topic with classification on a two-point and on a five-point ordinal scale, and quantification of the distribution of sentiment towards a topic across a number of tweets: again on a two-point and on a five-point ordinal scale. Compared to 2016, we made two changes: (i) we introduced a new language, Arabic, for all subtasks, and (ii)~we made available information from the profiles of the Twitter users who posted the target tweets. The task continues to be very popular, with a total of 48 teams participating this year.

* sentiment analysis, Twitter, classification, quantification, ranking, English, Arabic 
Viaarxiv icon

SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)

Apr 16, 2019
Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar

Figure 1 for SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)
Figure 2 for SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)
Figure 3 for SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)
Figure 4 for SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)

This paper presents the results and main findings of the Identifying and Categorizing Offensive Language in Social Media (OffensEval) shared task organized with SemEval-2019. SemEval-2019 Task 6 provided participants with the Offensive Language Identification Dataset (OLID), an annotated dataset containing over 14,000 English tweets. The competition was divided into three sub-tasks. In sub-task A systems were trained to discriminate between offensive and non-offensive tweets, in sub-task B systems were trained to identify the type of offensive content in the post, and finally, in sub-task C systems were trained to identify the target of offensive posts. OffensEval attracted a large number of participants and it was one of the most popular tasks in SemEval-2019. In total, nearly 800 teams signed up to participate in the task and 115 of them submitted results which are presented and analyzed in this report.

* Proceedings of the International Workshop on Semantic Evaluation (SemEval) 
Viaarxiv icon

Predicting the Type and Target of Offensive Posts in Social Media

Apr 16, 2019
Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar

Figure 1 for Predicting the Type and Target of Offensive Posts in Social Media
Figure 2 for Predicting the Type and Target of Offensive Posts in Social Media
Figure 3 for Predicting the Type and Target of Offensive Posts in Social Media
Figure 4 for Predicting the Type and Target of Offensive Posts in Social Media

As offensive content has become pervasive in social media, there has been much research in identifying potentially offensive messages. However, previous work on this topic did not consider the problem as a whole, but rather focused on detecting very specific types of offensive content, e.g., hate speech, cyberbulling, or cyber-aggression. In contrast, here we target several different kinds of offensive content. In particular, we model the task hierarchically, identifying the type and the target of offensive messages in social media. For this purpose, we complied the Offensive Language Identification Dataset (OLID), a new dataset with tweets annotated for offensive content using a fine-grained three-layer annotation scheme, which we make publicly available. We discuss the main similarities and differences between OLID and pre-existing datasets for hate speech identification, aggression detection, and similar tasks. We further experiment with and we compare the performance of different machine learning models on OLID.

* Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) 
Viaarxiv icon

The ARIEL-CMU Systems for LoReHLT18

Feb 24, 2019
Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J. Jansen, Yiming Yang, Lori Levin, Florian Metze, Teruko Mitamura, David R. Mortensen, Graham Neubig, Eduard Hovy, Alan W Black, Jaime Carbonell, Graham V. Horwood, Shabnam Tafreshi, Mona Diab, Efsun S. Kayi, Noura Farra, Kathleen McKeown

Figure 1 for The ARIEL-CMU Systems for LoReHLT18
Figure 2 for The ARIEL-CMU Systems for LoReHLT18
Figure 3 for The ARIEL-CMU Systems for LoReHLT18
Figure 4 for The ARIEL-CMU Systems for LoReHLT18

This paper describes the ARIEL-CMU submissions to the Low Resource Human Language Technologies (LoReHLT) 2018 evaluations for the tasks Machine Translation (MT), Entity Discovery and Linking (EDL), and detection of Situation Frames in Text and Speech (SF Text and Speech).

Viaarxiv icon

SMARTies: Sentiment Models for Arabic Target Entities

Jan 12, 2017
Noura Farra, Kathleen McKeown

Figure 1 for SMARTies: Sentiment Models for Arabic Target Entities
Figure 2 for SMARTies: Sentiment Models for Arabic Target Entities
Figure 3 for SMARTies: Sentiment Models for Arabic Target Entities
Figure 4 for SMARTies: Sentiment Models for Arabic Target Entities

We consider entity-level sentiment analysis in Arabic, a morphologically rich language with increasing resources. We present a system that is applied to complex posts written in response to Arabic newspaper articles. Our goal is to identify important entity "targets" within the post along with the polarity expressed about each target. We achieve significant improvements over multiple baselines, demonstrating that the use of specific morphological representations improves the performance of identifying both important targets and their sentiment, and that the use of distributional semantic clusters further boosts performances for these representations, especially when richer linguistic resources are not available.

* To be published in Proceedings of the European Chapter of the Association for Computational Linguistics (EACL 2017) 
Viaarxiv icon