Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Improving Editorial Workflow and Metadata Quality at Springer Nature

Mar 24, 2021
Angelo A. Salatino, Francesco Osborne, Aliaksandr Birukou, Enrico Motta

Identifying the research topics that best describe the scope of a scientific publication is a crucial task for editors, in particular because the quality of these annotations determine how effectively users are able to discover the right content in online libraries. For this reason, Springer Nature, the world's largest academic book publisher, has traditionally entrusted this task to their most expert editors. These editors manually analyse all new books, possibly including hundreds of chapters, and produce a list of the most relevant topics. Hence, this process has traditionally been very expensive, time-consuming, and confined to a few senior editors. For these reasons, back in 2016 we developed Smart Topic Miner (STM), an ontology-driven application that assists the Springer Nature editorial team in annotating the volumes of all books covering conference proceedings in Computer Science. Since then STM has been regularly used by editors in Germany, China, Brazil, India, and Japan, for a total of about 800 volumes per year. Over the past three years the initial prototype has iteratively evolved in response to feedback from the users and evolving requirements. In this paper we present the most recent version of the tool and describe the evolution of the system over the years, the key lessons learnt, and the impact on the Springer Nature workflow. In particular, our solution has drastically reduced the time needed to annotate proceedings and significantly improved their discoverability, resulting in 9.3 million additional downloads. We also present a user study involving 9 editors, which yielded excellent results in term of usability, and report an evaluation of the new topic classifier used by STM, which outperforms previous versions in recall and F-measure.

* In: The Semantic Web - ISWC 2019. Lecture Notes in Computer Science, vol 11779. Springer, Cham 

  Access Paper or Ask Questions

Unifying Relational Sentence Generation and Retrieval for Medical Image Report Composition

Jan 09, 2021
Fuyu Wang, Xiaodan Liang, Lin Xu, Liang Lin

Beyond generating long and topic-coherent paragraphs in traditional captioning tasks, the medical image report composition task poses more task-oriented challenges by requiring both the highly-accurate medical term diagnosis and multiple heterogeneous forms of information including impression and findings. Current methods often generate the most common sentences due to dataset bias for individual case, regardless of whether the sentences properly capture key entities and relationships. Such limitations severely hinder their applicability and generalization capability in medical report composition where the most critical sentences lie in the descriptions of abnormal diseases that are relatively rare. Moreover, some medical terms appearing in one report are often entangled with each other and co-occurred, e.g. symptoms associated with a specific disease. To enforce the semantic consistency of medical terms to be incorporated into the final reports and encourage the sentence generation for rare abnormal descriptions, we propose a novel framework that unifies template retrieval and sentence generation to handle both common and rare abnormality while ensuring the semantic-coherency among the detected medical terms. Specifically, our approach exploits hybrid-knowledge co-reasoning: i) explicit relationships among all abnormal medical terms to induce the visual attention learning and topic representation encoding for better topic-oriented symptoms descriptions; ii) adaptive generation mode that changes between the template retrieval and sentence generation according to a contextual topic encoder. Experimental results on two medical report benchmarks demonstrate the superiority of the proposed framework in terms of both human and metrics evaluation.

* To appear in IEEE Transactions on Cybernetics 2021. We attempt to resolve the challenging medical report composition task by i) enforcing the semantic consistency of medical terms to be incorporated into the final reports; and ii) encouraging the sentence generation for rare abnormal descriptions 

  Access Paper or Ask Questions

Dependent Hierarchical Normalized Random Measures for Dynamic Topic Modeling

Jun 18, 2012
Changyou Chen, Nan Ding, Wray Buntine

We develop dependent hierarchical normalized random measures and apply them to dynamic topic modeling. The dependency arises via superposition, subsampling and point transition on the underlying Poisson processes of these measures. The measures used include normalised generalised Gamma processes that demonstrate power law properties, unlike Dirichlet processes used previously in dynamic topic modeling. Inference for the model includes adapting a recently developed slice sampler to directly manipulate the underlying Poisson process. Experiments performed on news, blogs, academic and Twitter collections demonstrate the technique gives superior perplexity over a number of previous models.

* ICML2012 

  Access Paper or Ask Questions

Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

Apr 24, 2020
Fanghui Liu, Xiaolin Huang, Yudong Chen, Johan A. K. Suykens

Random features is one of the most sought-after research topics in statistical machine learning to speed up kernel methods in large-scale situations. Related works have won the NeurIPS test-of-time award in 2017 and the ICML best paper finalist in 2019. However, comprehensive studies on this topic seem to be missing, which results in different, sometimes conflicting, statements. In this survey, we attempt to throughout and systematically review the past ten years work on random features regarding to algorithmic and theoretical aspects. First, the fundamental characteristics, primary motivations, and contributions of representative random features based algorithms are summarized according to their sampling scheme, learning procedure, variance reduction, and exploitation of training data. Second, we review theoretical results of random features to answer the key question: how many random features are needed to ensure a high approximation quality or no loss of empirical risk and expected risk in a learning estimator. Third, popular random features based algorithms are comprehensively evaluated on several large scale benchmark datasets on the approximation quality and the prediction performance for classification and regression. Last, we link random features to current over-parameterized deep neural networks (DNNs) by investigating their relationships, the usage of random features to analysis over-parameterized networks, and the gap in the current theoretical results. As a result, this survey could be a gentle use guide for practitioners to follow this topic, apply representative algorithms, and grasp theoretical results under various technical assumptions. We think that this survey helps to facilitate a discussion on ongoing issues for this topic, and specifically, it sheds light on promising research directions.

* The current version is not ready for public avaliable 

  Access Paper or Ask Questions

Online Interactive Collaborative Filtering Using Multi-Armed Bandit with Dependent Arms

Aug 11, 2017
Qing Wang, Chunqiu Zeng, Wubai Zhou, Tao Li, Larisa Shwartz, Genady Ya. Grabarnik

Online interactive recommender systems strive to promptly suggest to consumers appropriate items (e.g., movies, news articles) according to the current context including both the consumer and item content information. However, such context information is often unavailable in practice for the recommendation, where only the users' interaction data on items can be utilized. Moreover, the lack of interaction records, especially for new users and items, worsens the performance of recommendation further. To address these issues, collaborative filtering (CF), one of the recommendation techniques relying on the interaction data only, as well as the online multi-armed bandit mechanisms, capable of achieving the balance between exploitation and exploration, are adopted in the online interactive recommendation settings, by assuming independent items (i.e., arms). Nonetheless, the assumption rarely holds in reality, since the real-world items tend to be correlated with each other (e.g., two articles with similar topics). In this paper, we study online interactive collaborative filtering problems by considering the dependencies among items. We explicitly formulate the item dependencies as the clusters on arms, where the arms within a single cluster share the similar latent topics. In light of the topic modeling techniques, we come up with a generative model to generate the items from their underlying topics. Furthermore, an efficient online algorithm based on particle learning is developed for inferring both latent parameters and states of our model. Additionally, our inferred model can be naturally integrated with existing multi-armed selection strategies in the online interactive collaborating setting. Empirical studies on two real-world applications, online recommendations of movies and news, demonstrate both the effectiveness and efficiency of the proposed approach.

* Recommender systems; Interactive collaborative filtering; Topic modeling; Cold-start problem; Particle learning; 10 pages 

  Access Paper or Ask Questions

Katecheo: A Portable and Modular System for Multi-Topic Question Answering

Jul 01, 2019
Shirish Hirekodi, Seban Sunny, Leonard Topno, Alwin Daniel, Daniel Whitenack, Reuben Skewes, Stuart Cranney

We introduce a modular system that can be deployed on any Kubernetes cluster for question answering via REST API. This system, called Katecheo, includes four configurable modules that collectively enable identification of questions, classification of those questions into topics, a search of knowledge base articles, and reading comprehension. We demonstrate the system using publicly available, pre-trained models and knowledge base articles extracted from Stack Exchange sites. However, users can extend the system to any number of topics, or domains, without the need to modify any of the model serving code. All components of the system are open source and available under a permissive Apache 2 License.

* EMNLP 2019 system demo submission, 6 pages, 2 figures, 1 table 

  Access Paper or Ask Questions

Toward Interpretable Topic Discovery via Anchored Correlation Explanation

Jun 22, 2016
Kyle Reing, David C. Kale, Greg Ver Steeg, Aram Galstyan

Many predictive tasks, such as diagnosing a patient based on their medical chart, are ultimately defined by the decisions of human experts. Unfortunately, encoding experts' knowledge is often time consuming and expensive. We propose a simple way to use fuzzy and informal knowledge from experts to guide discovery of interpretable latent topics in text. The underlying intuition of our approach is that latent factors should be informative about both correlations in the data and a set of relevance variables specified by an expert. Mathematically, this approach is a combination of the information bottleneck and Total Correlation Explanation (CorEx). We give a preliminary evaluation of Anchored CorEx, showing that it produces more coherent and interpretable topics on two distinct corpora.

* presented at 2016 ICML Workshop on #Data4Good: Machine Learning in Social Good Applications, New York, NY 

  Access Paper or Ask Questions

End-to-End Knowledge-Routed Relational Dialogue System for Automatic Diagnosis

Mar 18, 2019
Lin Xu, Qixian Zhou, Ke Gong, Xiaodan Liang, Jianheng Tang, Liang Lin

Beyond current conversational chatbots or task-oriented dialogue systems that have attracted increasing attention, we move forward to develop a dialogue system for automatic medical diagnosis that converses with patients to collect additional symptoms beyond their self-reports and automatically makes a diagnosis. Besides the challenges for conversational dialogue systems (e.g. topic transition coherency and question understanding), automatic medical diagnosis further poses more critical requirements for the dialogue rationality in the context of medical knowledge and symptom-disease relations. Existing dialogue systems (Madotto, Wu, and Fung 2018; Wei et al. 2018; Li et al. 2017) mostly rely on data-driven learning and cannot be able to encode extra expert knowledge graph. In this work, we propose an End-to-End Knowledge-routed Relational Dialogue System (KR-DS) that seamlessly incorporates rich medical knowledge graph into the topic transition in dialogue management, and makes it cooperative with natural language understanding and natural language generation. A novel Knowledge-routed Deep Q-network (KR-DQN) is introduced to manage topic transitions, which integrates a relational refinement branch for encoding relations among different symptoms and symptom-disease pairs, and a knowledge-routed graph branch for topic decision-making. Extensive experiments on a public medical dialogue dataset show our KR-DS significantly beats state-of-the-art methods (by more than 8% in diagnosis accuracy). We further show the superiority of our KR-DS on a newly collected medical dialogue system dataset, which is more challenging retaining original self-reports and conversational data between patients and doctors.

* 8 pages, 5 figues, AAAI 

  Access Paper or Ask Questions