Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

How Metaphors Impact Political Discourse: A Large-Scale Topic-Agnostic Study Using Neural Metaphor Detection

Apr 08, 2021
Vinodkumar Prabhakaran, Marek Rei, Ekaterina Shutova

Metaphors are widely used in political rhetoric as an effective framing device. While the efficacy of specific metaphors such as the war metaphor in political discourse has been documented before, those studies often rely on small number of hand-coded instances of metaphor use. Larger-scale topic-agnostic studies are required to establish the general persuasiveness of metaphors as a device, and to shed light on the broader patterns that guide their persuasiveness. In this paper, we present a large-scale data-driven study of metaphors used in political discourse. We conduct this study on a publicly available dataset of over 85K posts made by 412 US politicians in their Facebook public pages, up until Feb 2017. Our contributions are threefold: we show evidence that metaphor use correlates with ideological leanings in complex ways that depend on concurrent political events such as winning or losing elections; we show that posts with metaphors elicit more engagement from their audience overall even after controlling for various socio-political factors such as gender and political party affiliation; and finally, we demonstrate that metaphoricity is indeed the reason for increased engagement of posts, through a fine-grained linguistic analysis of metaphorical vs. literal usages of 513 words across 70K posts.

* The International AAAI Conference on Web and Social Media (ICWSM) 2021 
* Published at ICWSM 2021. Please cite that version for academic publications 

  Access Paper or Ask Questions

Automatic construction of Chinese herbal prescription from tongue image via CNNs and auxiliary latent therapy topics

Mar 01, 2018
Yang Hu, Guihua Wen, Huiqiang Liao, Changjun Wang, Dan Dai, Zhiwen Yu, Jun Zhang

The tongue image is an important physical information of human, it is of great importance to the diagnosis and treatment in clinical medicine. Herbal prescriptions are simple, noninvasive and low side effects, and are widely applied in China. Researches on automatic construction technology of herbal prescription based on tongue image have great significance for deep learning to explore the relevance from tongue image to herbal prescription, and can be applied to healthcare services in mobile medical system. In order to adapt to the tongue image in a variety of photographing environments and construct the herbal prescriptions, a neural network framework for prescriptions construction is designed, which includes single / double convolution channels and fully connected layers, and propose the mechanism of auxiliary therapy topic loss to model the therapy of Chinese doctors then alleviate the interference of sparse output labels to the diversity of results. The experimental data include the patient tongue images and their corresponding prescriptions from real world outpatient clinic, and the experiment results can generate the prescriptions that are close to the real samples, which verifies the feasibility of the proposed method for automatic construction of herbal prescription from tongue image. Also, provides a reference for automatic herbal prescription construction from more physical information (or integrated body information).

  Access Paper or Ask Questions

Computational analyses of the topics, sentiments, literariness, creativity and beauty of texts in a large Corpus of English Literature

Jan 12, 2022
Arthur M. Jacobs, Annette Kinder

The Gutenberg Literary English Corpus (GLEC, Jacobs, 2018a) provides a rich source of textual data for research in digital humanities, computational linguistics or neurocognitive poetics. In this study we address differences among the different literature categories in GLEC, as well as differences between authors. We report the results of three studies providing i) topic and sentiment analyses for six text categories of GLEC (i.e., children and youth, essays, novels, plays, poems, stories) and its >100 authors, ii) novel measures of semantic complexity as indices of the literariness, creativity and book beauty of the works in GLEC (e.g., Jane Austen's six novels), and iii) two experiments on text classification and authorship recognition using novel features of semantic complexity. The data on two novel measures estimating a text's literariness, intratextual variance and stepwise distance (van Cranenburgh et al., 2019) revealed that plays are the most literary texts in GLEC, followed by poems and novels. Computation of a novel index of text creativity (Gray et al., 2016) revealed poems and plays as the most creative categories with the most creative authors all being poets (Milton, Pope, Keats, Byron, or Wordsworth). We also computed a novel index of perceived beauty of verbal art (Kintsch, 2012) for the works in GLEC and predict that Emma is the theoretically most beautiful of Austen's novels. Finally, we demonstrate that these novel measures of semantic complexity are important features for text classification and authorship recognition with overall predictive accuracies in the range of .75 to .97. Our data pave the way for future computational and empirical studies of literature or experiments in reading psychology and offer multiple baselines and benchmarks for analysing and validating other book corpora.

* 37 pages, 12 figures 

  Access Paper or Ask Questions

Improving Logical-Level Natural Language Generation with Topic-Conditioned Data Augmentation and Logical Form Generation

Dec 12, 2021
Ao Liu, Congjian Luo, Naoaki Okazaki

Logical Natural Language Generation, i.e., generating textual descriptions that can be logically entailed by a structured table, has been a challenge due to the low fidelity of the generation. \citet{chen2020logic2text} have addressed this problem by annotating interim logical programs to control the generation contents and semantics, and presented the task of table-aware logical form to text (Logic2text) generation. However, although table instances are abundant in the real world, logical forms paired with textual descriptions require costly human annotation work, which limits the performance of neural models. To mitigate this, we propose topic-conditioned data augmentation (TopicDA), which utilizes GPT-2 to generate unpaired logical forms and textual descriptions directly from tables. We further introduce logical form generation (LG), a dual task of Logic2text that requires generating a valid logical form based on a text description of a table. We also propose a semi-supervised learning approach to jointly train a Logic2text and an LG model with both labeled and augmented data. The two models benefit from each other by providing extra supervision signals through back-translation. Experimental results on the Logic2text dataset and the LG task demonstrate that our approach can effectively utilize the augmented data and outperform supervised baselines by a substantial margin.

  Access Paper or Ask Questions

Generating Cyber Threat Intelligence to Discover Potential Security Threats Using Classification and Topic Modeling

Aug 16, 2021
Md Imran Hossen, Ashraful Islam, Farzana Anowar, Eshtiak Ahmed, Mohammad Masudur Rahman

Due to the variety of cyber-attacks or threats, the cybersecurity community has been enhancing the traditional security control mechanisms to an advanced level so that automated tools can encounter potential security threats. Very recently a term, Cyber Threat Intelligence (CTI) has been represented as one of the proactive and robust mechanisms because of its automated cybersecurity threat prediction based on data. In general, CTI collects and analyses data from various sources e.g. online security forums, social media where cyber enthusiasts, analysts, even cybercriminals discuss cyber or computer security related topics and discovers potential threats based on the analysis. As the manual analysis of every such discussion i.e. posts on online platforms is time-consuming, inefficient, and susceptible to errors, CTI as an automated tool can perform uniquely to detect cyber threats. In this paper, our goal is to identify and explore relevant CTI from hacker forums by using different supervised and unsupervised learning techniques. To this end, we collect data from a real hacker forum and constructed two datasets: a binary dataset and a multi-class dataset. Our binary dataset contains two classes one containing cybersecurity-relevant posts and another one containing posts that are not related to security. This dataset is constructed using simple keyword search technique. Using a similar approach, we further categorize posts from security-relevant posts into five different threat categories. We then applied several machine learning classifiers along with deep neural network-based classifiers and use them on the datasets to compare their performances. We also tested the classifiers on a leaked dataset with labels named as our ground truth. We further explore the datasets using unsupervised techniques i.e. Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF).

  Access Paper or Ask Questions

Team Hitachi @ AutoMin 2021: Reference-free Automatic Minuting Pipeline with Argument Structure Construction over Topic-based Summarization

Dec 06, 2021
Atsuki Yamaguchi, Gaku Morio, Hiroaki Ozaki, Ken-ichi Yokote, Kenji Nagamatsu

This paper introduces the proposed automatic minuting system of the Hitachi team for the First Shared Task on Automatic Minuting (AutoMin-2021). We utilize a reference-free approach (i.e., without using training minutes) for automatic minuting (Task A), which first splits a transcript into blocks on the basis of topics and subsequently summarizes those blocks with a pre-trained BART model fine-tuned on a summarization corpus of chat dialogue. In addition, we apply a technique of argument mining to the generated minutes, reorganizing them in a well-structured and coherent way. We utilize multiple relevance scores to determine whether or not a minute is derived from the same meeting when either a transcript or another minute is given (Task B and C). On top of those scores, we train a conventional machine learning model to bind them and to make final decisions. Consequently, our approach for Task A achieve the best adequacy score among all submissions and close performance to the best system in terms of grammatical correctness and fluency. For Task B and C, the proposed model successfully outperformed a majority vote baseline.

* 8 pages, 4 figures 

  Access Paper or Ask Questions

Weakly Supervised Prototype Topic Model with Discriminative Seed Words: Modifying the Category Prior by Self-exploring Supervised Signals

Nov 20, 2021
Bing Wang, Yue Wang, Ximing Li, Jihong Ouyang

Dataless text classification, i.e., a new paradigm of weakly supervised learning, refers to the task of learning with unlabeled documents and a few predefined representative words of categories, known as seed words. The recent generative dataless methods construct document-specific category priors by using seed word occurrences only, however, such category priors often contain very limited and even noisy supervised signals. To remedy this problem, in this paper we propose a novel formulation of category prior. First, for each document, we consider its label membership degree by not only counting seed word occurrences, but also using a novel prototype scheme, which captures pseudo-nearest neighboring categories. Second, for each label, we consider its frequency prior knowledge of the corpus, which is also a discriminative knowledge for classification. By incorporating the proposed category prior into the previous generative dataless method, we suggest a novel generative dataless method, namely Weakly Supervised Prototype Topic Model (WSPTM). The experimental results on real-world datasets demonstrate that WSPTM outperforms the existing baseline methods.

  Access Paper or Ask Questions

Deep Sentiment Classification and Topic Discovery on Novel Coronavirus or COVID-19 Online Discussions: NLP Using LSTM Recurrent Neural Network Approach

Apr 24, 2020
Hamed Jelodar, Yongli Wang, Rita Orji, Hucheng Huang

Internet forums and public social media, such as online healthcare forums, provide a convenient channel for users (people/patients) concerned about health issues to discuss and share information with each other. In late December 2019, an outbreak of a novel coronavirus (infection from which results in the disease named COVID-19) was reported, and, due to the rapid spread of the virus in other parts of the world, the World Health Organization declared a state of emergency. In this paper, we used automated extraction of COVID-19 related discussions from social media and a natural language process (NLP) method based on topic modeling to uncover various issues related to COVID-19 from public opinions. Moreover, we also investigate how to use LSTM recurrent neural network for sentiment classification of COVID-19 comments. Our findings shed light on the importance of using public opinions and suitable computational techniques to understand issues surrounding COVID-19 and to guide related decision-making.

  Access Paper or Ask Questions

BioNLP-OST 2019 RDoC Tasks: Multi-grain Neural Relevance Ranking Using Topics and Attention Based Query-Document-Sentence Interactions

Oct 02, 2019
Yatin Chaudhary, Pankaj Gupta, Hinrich Schütze

This paper presents our system details and results of participation in the RDoC Tasks of BioNLP-OST 2019. Research Domain Criteria (RDoC) construct is a multi-dimensional and broad framework to describe mental health disorders by combining knowledge from genomics to behaviour. Non-availability of RDoC labelled dataset and tedious labelling process hinders the use of RDoC framework to reach its full potential in Biomedical research community and Healthcare industry. Therefore, Task-1 aims at retrieval and ranking of PubMed abstracts relevant to a given RDoC construct and Task-2 aims at extraction of the most relevant sentence from a given PubMed abstract. We investigate (1) attention based supervised neural topic model and SVM for retrieval and ranking of PubMed abstracts and, further utilize BM25 and other relevance measures for re-ranking, (2) supervised and unsupervised sentence ranking models utilizing multi-view representations comprising of query-aware attention-based sentence representation (QAR), bag-of-words (BoW) and TF-IDF. Our best systems achieved 1st rank and scored 0.86 mean average precision (mAP) and 0.58 macro average accuracy (MAA) in Task-1 and Task-2 respectively.

* EMNLP2019, 10 pages, 2 figures, 7 tables 

  Access Paper or Ask Questions