Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Predicting Research Trends From Arxiv

Mar 07, 2019
Steffen Eger, Chao Li, Florian Netzer, Iryna Gurevych

We perform trend detection on two datasets of Arxiv papers, derived from its machine learning (cs.LG) and natural language processing (cs.CL) categories. Our approach is bottom-up: we first rank papers by their normalized citation counts, then group top-ranked papers into different categories based on the tasks that they pursue and the methods they use. We then analyze these resulting topics. We find that the dominating paradigm in cs.CL revolves around natural language generation problems and those in cs.LG revolve around reinforcement learning and adversarial principles. By extrapolation, we predict that these topics will remain lead problems/approaches in their fields in the short- and mid-term.

* Refresh workshop paper (December 2018) 

  Access Paper or Ask Questions

Curiosity Based Exploration for Learning Terrain Models

Oct 24, 2013
Yogesh Girdhar, David Whitney, Gregory Dudek

We present a robotic exploration technique in which the goal is to learn to a visual model and be able to distinguish between different terrains and other visual components in an unknown environment. We use ROST, a realtime online spatiotemporal topic modeling framework to model these terrains using the observations made by the robot, and then use an information theoretic path planning technique to define the exploration path. We conduct experiments with aerial view and underwater datasets with millions of observations and varying path lengths, and find that paths that are biased towards locations with high topic perplexity produce better terrain models with high discriminative power, especially with paths of length close to the diameter of the world.

* 7 pages, 5 figures, submitted to ICRA 2014 

  Access Paper or Ask Questions

State of the Art, Evaluation and Recommendations regarding "Document Processing and Visualization Techniques"

Dec 29, 2004
Martin Rajman, Martin Vesely, Pierre Andrews

Several Networks of Excellence have been set up in the framework of the European FP5 research program. Among these Networks of Excellence, the NEMIS project focuses on the field of Text Mining. Within this field, document processing and visualization was identified as one of the key topics and the WG1 working group was created in the NEMIS project, to carry out a detailed survey of techniques associated with the text mining process and to identify the relevant research topics in related research areas. In this document we present the results of this comprehensive survey. The report includes a description of the current state-of-the-art and practice, a roadmap for follow-up research in the identified areas, and recommendations for anticipated technological development in the domain of text mining.

* 54 pages, Report of Working Group 1 for the European Network of Excellence (NoE) in Text Mining and its Applications in Statistics (NEMIS) 

  Access Paper or Ask Questions

Analyse scientométrique du domaine de l'infectiologie de 2000 à 2020

Feb 15, 2022
Lesya Baudoin, Anne Glanard, Abdelghani Maddi, Wilfriedo Mescheba, Frédérique Sachwald

Research on infectious diseases constitutes a transversal scientific field. A specific corpus is designed by combining a controlled language (Medline MeSH thesaurus) and the categorization of journals (Web of Science). From this global corpus, the article characterizes the publications from the top 20 countries publishing in the field and evolutions between 2000 and 2020. Topic maps show the research themes within the field of infectious diseases both in the world and in France. The explosion of publications on Covid-19 in 2020 has a quite visible impact on the topic map in infectious diseases and changes the position of some countries in this field of research. The conclusion points to issues for further research as more complete data will become available on the Covid-19 period.

* in French. Histoire de la recherche contemporaine : la revue du Comit{\'e} pour l'histoire du CNRS , CNRS {\'E}ditions, 2021 

  Access Paper or Ask Questions

Persian Keyphrase Generation Using Sequence-to-Sequence Models

Sep 25, 2020
Ehsan Doostmohammadi, Mohammad Hadi Bokaei, Hossein Sameti

Keyphrases are a very short summary of an input text and provide the main subjects discussed in the text. Keyphrase extraction is a useful upstream task and can be used in various natural language processing problems, for example, text summarization and information retrieval, to name a few. However, not all the keyphrases are explicitly mentioned in the body of the text. In real-world examples there are always some topics that are discussed implicitly. Extracting such keyphrases requires a generative approach, which is adopted here. In this paper, we try to tackle the problem of keyphrase generation and extraction from news articles using deep sequence-to-sequence models. These models significantly outperform the conventional methods such as Topic Rank, KPMiner, and KEA in the task of keyphrase extraction.

  Access Paper or Ask Questions

COVID-19 Kaggle Literature Organization

Aug 04, 2020
Maksim Ekin Eren, Nick Solovyev, Edward Raff, Charles Nicholas, Ben Johnson

The world has faced the devastating outbreak of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), or COVID-19, in 2020. Research in the subject matter was fast-tracked to such a point that scientists were struggling to keep up with new findings. With this increase in the scientific literature came a need to present that literature so that researchers and health professionals could find the information they require. We describe an approach to organize and visualize the scientific literature on or related to COVID-19 using machine learning techniques so that papers on similar topics are located close to each other. By doing so, the navigation of topics and related papers is simplified. This approach was utilized and implemented by the authors using the widely recognized CORD-19 dataset.

* Maksim Ekin Eren , Nick Solovyev and Ben Johnson. 2020. COVID-19 Kaggle Literature Organization. In DocEng20: ACM Symposium on Document Engineering, September 29, 2020 to October 2, 2020, San Jose, CA, USA. ACM, New York, NY, USA, 4 pages 

  Access Paper or Ask Questions

Communication-Efficient Parallel Belief Propagation for Latent Dirichlet Allocation

Jun 11, 2012
Jian-feng Yan, Zhi-Qiang Liu, Yang Gao, Jia Zeng

This paper presents a novel communication-efficient parallel belief propagation (CE-PBP) algorithm for training latent Dirichlet allocation (LDA). Based on the synchronous belief propagation (BP) algorithm, we first develop a parallel belief propagation (PBP) algorithm on the parallel architecture. Because the extensive communication delay often causes a low efficiency of parallel topic modeling, we further use Zipf's law to reduce the total communication cost in PBP. Extensive experiments on different data sets demonstrate that CE-PBP achieves a higher topic modeling accuracy and reduces more than 80% communication cost than the state-of-the-art parallel Gibbs sampling (PGS) algorithm.

* 9 pages, 5 figures 

  Access Paper or Ask Questions

The state-of-the-art in text-based automatic personality prediction

Oct 04, 2021
Ali-Reza Feizi-Derakhshi, Mohammad-Reza Feizi-Derakhshi, Majid Ramezani, Narjes Nikzad-Khasmakhi, Meysam Asgari-Chenaghlu, Taymaz Akan, Mehrdad Ranjbar-Khadivi, Elnaz Zafarni-Moattar, Zoleikha Jahanbakhsh-Naghadeh

Personality detection is an old topic in psychology and Automatic Personality Prediction (or Perception) (APP) is the automated (computationally) forecasting of the personality on different types of human generated/exchanged contents (such as text, speech, image, video). The principal objective of this study is to offer a shallow (overall) review of natural language processing approaches on APP since 2010. With the advent of deep learning and following it transfer-learning and pre-trained model in NLP, APP research area has been a hot topic, so in this review, methods are categorized into three; pre-trained independent, pre-trained model based, multimodal approaches. Also, to achieve a comprehensive comparison, reported results are informed by datasets.

  Access Paper or Ask Questions

Recent Trends in Deep Learning Based Personality Detection

Aug 27, 2019
Yash Mehta, Navonil Majumder, Alexander Gelbukh, Erik Cambria

Recently, the automatic prediction of personality traits has received a lot of attention. Specifically, personality trait prediction from multimodal data has emerged as a hot topic within the field of affective computing. In this paper, we review significant machine learning models which have been employed for personality detection, with an emphasis on deep learning-based methods. This review paper provides an overview of the most popular approaches to automated personality detection, various computational datasets, its industrial applications, and state-of-the-art machine learning models for personality detection with specific focus on multimodal approaches. Personality detection is a very broad and diverse topic: this survey only focuses on computational approaches and leaves out psychological studies on personality detection.

  Access Paper or Ask Questions

Capturing Semantic Similarity for Entity Linking with Convolutional Neural Networks

Apr 04, 2016
Matthew Francis-Landau, Greg Durrett, Dan Klein

A key challenge in entity linking is making effective use of contextual information to disambiguate mentions that might refer to different entities in different contexts. We present a model that uses convolutional neural networks to capture semantic correspondence between a mention's context and a proposed target entity. These convolutional networks operate at multiple granularities to exploit various kinds of topic information, and their rich parameterization gives them the capacity to learn which n-grams characterize different topics. We combine these networks with a sparse linear model to achieve state-of-the-art performance on multiple entity linking datasets, outperforming the prior systems of Durrett and Klein (2014) and Nguyen et al. (2014).

* Accepted at NAACL 2016 

  Access Paper or Ask Questions