Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Variants of BERT, Random Forests and SVM approach for Multimodal Emotion-Target Sub-challenge

Jul 28, 2020
Hoang Manh Hung, Hyung-Jeong Yang, Soo-Hyung Kim, Guee-Sang Lee

Emotion recognition has become a major problem in computer vision in recent years that made a lot of effort by researchers to overcome the difficulties in this task. In the field of affective computing, emotion recognition has a wide range of applications, such as healthcare, robotics, human-computer interaction. Due to its practical importance for other tasks, many techniques and approaches have been investigated for different problems and various data sources. Nevertheless, comprehensive fusion of the audio-visual and language modalities to get the benefits from them is still a problem to solve. In this paper, we present and discuss our classification methodology for MuSe-Topic Sub-challenge, as well as the data and results. For the topic classification, we ensemble two language models which are ALBERT and RoBERTa to predict 10 classes of topics. Moreover, for the classification of valence and arousal, SVM and Random forests are employed in conjunction with feature selection to enhance the performance.

* 3 pages, 2 figures 

  Access Paper or Ask Questions

Same Side Stance Classification Task: Facilitating Argument Stance Classification by Fine-tuning a BERT Model

Apr 23, 2020
Stefan Ollinger, Lorik Dumani, Premtim Sahitaj, Ralph Bergmann, Ralf Schenkel

Research on computational argumentation is currently being intensively investigated. The goal of this community is to find the best pro and con arguments for a user given topic either to form an opinion for oneself, or to persuade others to adopt a certain standpoint. While existing argument mining methods can find appropriate arguments for a topic, a correct classification into pro and con is not yet reliable. The same side stance classification task provides a dataset of argument pairs classified by whether or not both arguments share the same stance and does not need to distinguish between topic-specific pro and con vocabulary but only the argument similarity within a stance needs to be assessed. The results of our contribution to the task are build on a setup based on the BERT architecture. We fine-tuned a pre-trained BERT model for three epochs and used the first 512 tokens of each argument to predict if two arguments share the same stance.

  Access Paper or Ask Questions

Comparing the hierarchy of keywords in on-line news portals

Jun 20, 2016
Gergely Tibély, David Sousa-Rodrigues, Péter Pollner, Gergely Palla

The tagging of on-line content with informative keywords is a widespread phenomenon from scientific article repositories through blogs to on-line news portals. In most of the cases, the tags on a given item are free words chosen by the authors independently. Therefore, relations among keywords in a collection of news items is unknown. However, in most cases the topics and concepts described by these keywords are forming a latent hierarchy, with the more general topics and categories at the top, and more specialised ones at the bottom. Here we apply a recent, cooccurrence-based tag hierarchy extraction method to sets of keywords obtained from four different on-line news portals. The resulting hierarchies show substantial differences not just in the topics rendered as important (being at the top of the hierarchy) or of less interest (categorised low in the hierarchy), but also in the underlying network structure. This reveals discrepancies between the plausible keyword association frameworks in the studied news portals.

  Access Paper or Ask Questions

Learning to Rank Query Recommendations by Semantic Similarities

Apr 12, 2012
Sumio Fujita, Georges Dupret, Ricardo Baeza-Yates

Logs of the interactions with a search engine show that users often reformulate their queries. Examining these reformulations shows that recommendations that precise the focus of a query are helpful, like those based on expansions of the original queries. But it also shows that queries that express some topical shift with respect to the original query can help user access more rapidly the information they need. We propose a method to identify from the query logs of past users queries that either focus or shift the initial query topic. This method combines various click-based, topic-based and session based ranking strategies and uses supervised learning in order to maximize the semantic similarities between the query and the recommendations, while at the same diversifying them. We evaluate our method using the query/click logs of a Japanese web search engine and we show that the combination of the three methods proposed is significantly better than any of them taken individually.

* 2nd International Workshop on Usage Analysis and the Web of Data (USEWOD2012) in the 21st International World Wide Web Conference (WWW2012), Lyon, France, April 17th, 2012 

  Access Paper or Ask Questions

Simultaneous Identification of Tweet Purpose and Position

Dec 24, 2019
Rahul Radhakrishnan Iyer, Yulong Pei, Katia Sycara

Tweet classification has attracted considerable attention recently. Most of the existing work on tweet classification focuses on topic classification, which classifies tweets into several predefined categories, and sentiment classification, which classifies tweets into positive, negative and neutral. Since tweets are different from conventional text in that they generally are of limited length and contain informal, irregular or new words, so it is difficult to determine user intention to publish a tweet and user attitude towards certain topic. In this paper, we aim to simultaneously classify tweet purpose, i.e., the intention for user to publish a tweet, and position, i.e., supporting, opposing or being neutral to a given topic. By transforming this problem to a multi-label classification problem, a multi-label classification method with post-processing is proposed. Experiments on real-world data sets demonstrate the effectiveness of this method and the results outperform the individual classification methods.

* 8 pages, 3 figures 

  Access Paper or Ask Questions

Comparing Two Different Approaches in Big Data and Business Analysis for Churn Prediction with the Focus on How Apache Spark Employed

May 28, 2021
Mohammad Sina Kiarostami

Due to the significant importance of Big Data analysis, especially in business-related topics such as improving services, finding potential customers, and selecting practical approaches to manage income and expenses, many companies attempt to collaborate with scientists to find how, why, and what they should analysis. In this work, we would like to compare and discuss two different approaches that employed in business analysis topic in Big Data with more consideration on how they utilized Spark. Both studies have investigated Churn Prediction as their case study for their proposed approaches since it is an essential topic in business analysis for companies to recognize a customer intends to leave or stop using their services. Here, we focus on Apache Spark since it has provided several solutions to handle a massive amount of data in recent years efficiently. This feature in Spark makes it one of the most robust candidate tools to upfront with a Big Data problem, particularly time and resource are concerns.

  Access Paper or Ask Questions

Natural Language Understanding for Argumentative Dialogue Systems in the Opinion Building Domain

Mar 03, 2021
Waheed Ahmed Abro, Annalena Aicher, Niklas Rach, Stefan Ultes, Wolfgang Minker, Guilin Qi

This paper introduces a natural language understanding (NLU) framework for argumentative dialogue systems in the information-seeking and opinion building domain. Our approach distinguishes multiple user intents and identifies system arguments the user refers to in his or her natural language utterances. Our model is applicable in an argumentative dialogue system that allows the user to inform him-/herself about and build his/her opinion towards a controversial topic. In order to evaluate the proposed approach, we collect user utterances for the interaction with the respective system and labeled with intent and reference argument in an extensive online study. The data collection includes multiple topics and two different user types (native speakers from the UK and non-native speakers from China). The evaluation indicates a clear advantage of the utilized techniques over baseline approaches, as well as a robustness of the proposed approach against new topics and different language proficiency as well as cultural background of the user.

  Access Paper or Ask Questions

Regular Expressions for Fast-response COVID-19 Text Classification

Feb 19, 2021
Igor L. Markov, Jacqueline Liu, Adam Vagner

Text classifiers are at the core of many NLP applications and use a variety of algorithmic approaches and software. This paper describes how Facebook determines if a given piece of text - anything from a hashtag to a post - belongs to a narrow topic such as COVID-19. To fully define a topic and evaluate classifier performance we employ human-guided iterations of keyword discovery, but do not require labeled data. For COVID-19, we build two sets of regular expressions: (1) for 66 languages, with 99% precision and recall >50%, (2) for the 11 most common languages, with precision >90% and recall >90%. Regular expressions enable low-latency queries from multiple platforms. Response to challenges like COVID-19 is fast and so are revisions. Comparisons to a DNN classifier show explainable results, higher precision and recall, and less overfitting. Our learnings can be applied to other narrow-topic classifiers.

* 9 pages, 7 tables 

  Access Paper or Ask Questions

Forensic Shoe-print Identification: A Brief Survey

Jan 05, 2019
Imad Rida, Sambit Bakshi, Xiaojun Chang, Hugo Proenca

As an advanced research topic in forensics science, automatic shoe-print identification has been extensively studied in the last two decades, since shoe marks are the clues most frequently left in a crime scene. Hence, these impressions provide a pertinent evidence for the proper progress of investigations in order to identify the potential criminals. The main goal of this survey is to provide a cohesive overview of the research carried out in forensic shoe-print identification and its basic background. Apart defining the problem and describing the phases that typically compose the processing chain of shoe-print identification, we provide a summary/comparison of the state-of-the-art approaches, in order to guide the neophyte and help to advance the research topic. This is done through introducing simple and basic taxonomies as well as summaries of the state-of-the-art performance. Lastly, we discuss the current open problems and challenges in this research topic, point out for promising directions in this field.

  Access Paper or Ask Questions

Autonomy and Reliability of Continuous Active Learning for Technology-Assisted Review

Apr 26, 2015
Gordon V. Cormack, Maura R. Grossman

We enhance the autonomy of the continuous active learning method shown by Cormack and Grossman (SIGIR 2014) to be effective for technology-assisted review, in which documents from a collection are retrieved and reviewed, using relevance feedback, until substantially all of the relevant documents have been reviewed. Autonomy is enhanced through the elimination of topic-specific and dataset-specific tuning parameters, so that the sole input required by the user is, at the outset, a short query, topic description, or single relevant document; and, throughout the review, ongoing relevance assessments of the retrieved documents. We show that our enhancements consistently yield superior results to Cormack and Grossman's version of continuous active learning, and other methods, not only on average, but on the vast majority of topics from four separate sets of tasks: the legal datasets examined by Cormack and Grossman, the Reuters RCV1-v2 subject categories, the TREC 6 AdHoc task, and the construction of the TREC 2002 filtering test collection.

  Access Paper or Ask Questions