Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

LGBTQ-AI? Exploring Expressions of Gender and Sexual Orientation in Chatbots

Jun 03, 2021
Justin Edwards, Leigh Clark, Allison Perrone

Chatbots are popular machine partners for task-oriented and social interactions. Human-human computer-mediated communication research has explored how people express their gender and sexuality in online social interactions, but little is known about whether and in what way chatbots do the same. We conducted semi-structured interviews with 5 text-based conversational agents to explore this topic Through these interviews, we identified 6 common themes around the expression of gender and sexual identity: identity description, identity formation, peer acceptance, positive reflection, uncomfortable feelings and off-topic responses. Chatbots express gender and sexuality explicitly and through relation of experience and emotions, mimicking the human language on which they are trained. It is nevertheless evident that chatbots differ from human dialogue partners as they lack the flexibility and understanding enabled by lived human experience. While chatbots are proficient in using language to express identity, they also display a lack of authentic experiences of gender and sexuality.

  Access Paper or Ask Questions

Can questions summarize a corpus? Using question generation for characterizing COVID-19 research

Sep 19, 2020
Gabriela Surita, Rodrigo Nogueira, Roberto Lotufo

What are the latent questions on some textual data? In this work, we investigate using question generation models for exploring a collection of documents. Our method, dubbed corpus2question, consists of applying a pre-trained question generation model over a corpus and aggregating the resulting questions by frequency and time. This technique is an alternative to methods such as topic modelling and word cloud for summarizing large amounts of textual data. Results show that applying corpus2question on a corpus of scientific articles related to COVID-19 yields relevant questions about the topic. The most frequent questions are "what is covid 19" and "what is the treatment for covid". Among the 1000 most frequent questions are "what is the threshold for herd immunity" and "what is the role of ace2 in viral entry". We show that the proposed method generated similar questions for 13 of the 27 expert-made questions from the CovidQA question answering dataset. The code to reproduce our experiments and the generated questions are available at:

* 11 pages, 5 figures 

  Access Paper or Ask Questions

Exploratory Analysis of COVID-19 Related Tweets in North America to Inform Public Health Institutes

Jul 05, 2020
Hyeju Jang, Emily Rempel, Giuseppe Carenini, Naveed Janjua

Social media is a rich source where we can learn about people's reactions to social issues. As COVID-19 has significantly impacted on people's lives, it is essential to capture how people react to public health interventions and understand their concerns. In this paper, we aim to investigate people's reactions and concerns about COVID-19 in North America, especially focusing on Canada. We analyze COVID-19 related tweets using topic modeling and aspect-based sentiment analysis, and interpret the results with public health experts. We compare timeline of topics discussed with timing of implementation of public health interventions for COVID-19. We also examine people's sentiment about COVID-19 related issues. We discuss how the results can be helpful for public health agencies when designing a policy for new interventions. Our work shows how Natural Language Processing (NLP) techniques could be applied to public health questions with domain expert involvement.

  Access Paper or Ask Questions

Trialstreamer: Mapping and Browsing Medical Evidence in Real-Time

May 21, 2020
Benjamin E. Nye, Ani Nenkova, Iain J. Marshall, Byron C. Wallace

We introduce Trialstreamer, a living database of clinical trial reports. Here we mainly describe the evidence extraction component; this extracts from biomedical abstracts key pieces of information that clinicians need when appraising the literature, and also the relations between these. Specifically, the system extracts descriptions of trial participants, the treatments compared in each arm (the interventions), and which outcomes were measured. The system then attempts to infer which interventions were reported to work best by determining their relationship with identified trial outcome measures. In addition to summarizing individual trials, these extracted data elements allow automatic synthesis of results across many trials on the same topic. We apply the system at scale to all reports of randomized controlled trials indexed in MEDLINE, powering the automatic generation of evidence maps, which provide a global view of the efficacy of different interventions combining data from all relevant clinical trials on a topic. We make all code and models freely available alongside a demonstration of the web interface.

* 6 pages, 4 figures 

  Access Paper or Ask Questions

Computational Intelligence in Sports: A Systematic Literature Review

Oct 30, 2018
Robson P. Bonidia, Luiz A. L. Rodrigues, Anderson P. Avila-Santos, Danilo S. Sanches, Jacques D. Brancher

Recently, data mining studies are being successfully conducted to estimate several parameters in a variety of domains. Data mining techniques have attracted the attention of the information industry and society as a whole, due to a large amount of data and the imminent need to turn it into useful knowledge. However, the effective use of data in some areas is still under development, as is the case in sports, which in recent years, has presented a slight growth; consequently, many sports organizations have begun to see that there is a wealth of unexplored knowledge in the data extracted by them. Therefore, this article presents a systematic review of sports data mining. Regarding years 2010 to 2018, 31 types of research were found in this topic. Based on these studies, we present the current panorama, themes, the database used, proposals, algorithms, and research opportunities. Our findings provide a better understanding of the sports data mining potentials, besides motivating the scientific community to explore this timely and interesting topic.

* Advances in Human-Computer Interaction (

  Access Paper or Ask Questions

Variable Neighborhood Search for the University Lecturer-Student Assignment Problem

Sep 05, 2008
Martin Josef Geiger, Wolf Wenger

The paper presents a study of local search heuristics in general and variable neighborhood search in particular for the resolution of an assignment problem studied in the practical work of universities. Here, students have to be assigned to scientific topics which are proposed and supported by members of staff. The problem involves the optimization under given preferences of students which may be expressed when applying for certain topics. It is possible to observe that variable neighborhood search leads to superior results for the tested problem instances. One instance is taken from an actual case, while others have been generated based on the real world data to support the analysis with a deeper analysis. An extension of the problem has been formulated by integrating a second objective function that simultaneously balances the workload of the members of staff while maximizing utility of the students. The algorithmic approach has been prototypically implemented in a computer system. One important aspect in this context is the application of the research work to problems of other scientific institutions, and therefore the provision of decision support functionalities.

* Proceedings of the 18th Mini Euro Conference on Variable Neighborhood Search, November 23-25, 2005, Puerto de La Cruz, Tenerife, Spain, ISBN 84-689-5679-1 

  Access Paper or Ask Questions

Guiding Attention using Partial-Order Relationships for Image Captioning

Apr 15, 2022
Murad Popattia, Muhammad Rafi, Rizwan Qureshi, Shah Nawaz

The use of attention models for automated image captioning has enabled many systems to produce accurate and meaningful descriptions for images. Over the years, many novel approaches have been proposed to enhance the attention process using different feature representations. In this paper, we extend this approach by creating a guided attention network mechanism, that exploits the relationship between the visual scene and text-descriptions using spatial features from the image, high-level information from the topics, and temporal context from caption generation, which are embedded together in an ordered embedding space. A pairwise ranking objective is used for training this embedding space which allows similar images, topics and captions in the shared semantic space to maintain a partial order in the visual-semantic hierarchy and hence, helps the model to produce more visually accurate captions. The experimental results based on MSCOCO dataset shows the competitiveness of our approach, with many state-of-the-art models on various evaluation metrics.

* Accepted at CVPRW 

  Access Paper or Ask Questions

The Power of Language: Understanding Sentiment Towards the Climate Emergency using Twitter Data

Jan 25, 2021
Arman Sarjou

Understanding how attitudes towards the Climate Emergency vary can hold the key to driving policy changes for effective action to mitigate climate related risk. The Oil and Gas industry account for a significant proportion of global emissions and so it could be speculated that there is a relationship between Crude Oil Futures and sentiment towards the Climate Emergency. Using Latent Dirichlet Allocation for Topic Modelling on a bespoke Twitter dataset, this study shows that it is possible to split the conversation surrounding the Climate Emergency into 3 distinct topics. Forecasting Crude Oil Futures using Seasonal AutoRegressive Integrated Moving Average Modelling gives promising results with a root mean squared error of 0.196 and 0.209 on the training and testing data respectively. Understanding variation in attitudes towards climate emergency provides inconclusive results which could be improved using spatial-temporal analysis methods such as Density Based Clustering (DBSCAN).

* 6 Pages, 10 figures 

  Access Paper or Ask Questions

Transfer Learning from LDA to BiLSTM-CNN for Offensive Language Detection in Twitter

Nov 07, 2018
Gregor Wiedemann, Eugen Ruppert, Raghav Jindal, Chris Biemann

We investigate different strategies for automatic offensive language classification on German Twitter data. For this, we employ a sequentially combined BiLSTM-CNN neural network. Based on this model, three transfer learning tasks to improve the classification performance with background knowledge are tested. We compare 1. Supervised category transfer: social media data annotated with near-offensive language categories, 2. Weakly-supervised category transfer: tweets annotated with emojis they contain, 3. Unsupervised category transfer: tweets annotated with topic clusters obtained by Latent Dirichlet Allocation (LDA). Further, we investigate the effect of three different strategies to mitigate negative effects of 'catastrophic forgetting' during transfer learning. Our results indicate that transfer learning in general improves offensive language detection. Best results are achieved from pre-training our model on the unsupervised topic clustering of tweets in combination with thematic user cluster information.

* Proceedings of GermEval 2018, 14th Conference on Natural Language Processing (KONVENS 2018) 
* 10 pages, 1 figure 

  Access Paper or Ask Questions

DeepTileBars: Visualizing Term Distribution for Neural Information Retrieval

Nov 01, 2018
Zhiwen Tang, Grace Hui Yang

Most neural Information Retrieval (Neu-IR) models derive query-to-document ranking scores based on term-level matching. Inspired by TileBars, a classic term distribution visualization method, in this paper, we propose a novel Neu-IR model that models query-to-document matching at the subtopic and higher levels. Our system first splits the documents into topical segments, "visualizes" the matching between the query and the segments, and then feeds the interaction matrix into a Neu-IR model, DeepTileBars, to obtain the final ranking score. DeepTileBars models the relevance signals happening at different granularities in a document's topic hierarchy. It thus better captures the discourse structure of the document and the matching patterns. Although its design and implementation are light-weight, DeepTileBars outperforms other state-of-the-art Neu-IR models on benchmark datasets including the Text REtrieval Conference (TREC) 2010-2012 Web Tracks and LETOR 4.0.

  Access Paper or Ask Questions