Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Chemical Identification and Indexing in PubMed Articles via BERT and Text-to-Text Approaches

Nov 30, 2021
Virginia Adams, Hoo-Chang Shin, Carol Anderson, Bo Liu, Anas Abidin

The Biocreative VII Track-2 challenge consists of named entity recognition, entity-linking (or entity-normalization), and topic indexing tasks -- with entities and topics limited to chemicals for this challenge. Named entity recognition is a well-established problem and we achieve our best performance with BERT-based BioMegatron models. We extend our BERT-based approach to the entity linking task. After the second stage of pretraining BioBERT with a metric-learning loss strategy called self-alignment pretraining (SAP), we link entities based on the cosine similarity between their SAP-BioBERT word embeddings. Despite the success of our named entity recognition experiments, we find the chemical indexing task generally more challenging. In addition to conventional NER methods, we attempt both named entity recognition and entity linking with a novel text-to-text or "prompt" based method that uses generative language models such as T5 and GPT. We achieve encouraging results with this new approach.

* Submission to the BioCreative VII challenge - Track-2 

  Access Paper or Ask Questions

When Creators Meet the Metaverse: A Survey on Computational Arts

Nov 26, 2021
Lik-Hang Lee, Zijun Lin, Rui Hu, Zhengya Gong, Abhishek Kumar, Tangyao Li, Sijia Li, Pan Hui

The metaverse, enormous virtual-physical cyberspace, has brought unprecedented opportunities for artists to blend every corner of our physical surroundings with digital creativity. This article conducts a comprehensive survey on computational arts, in which seven critical topics are relevant to the metaverse, describing novel artworks in blended virtual-physical realities. The topics first cover the building elements for the metaverse, e.g., virtual scenes and characters, auditory, textual elements. Next, several remarkable types of novel creations in the expanded horizons of metaverse cyberspace have been reflected, such as immersive arts, robotic arts, and other user-centric approaches fuelling contemporary creative outputs. Finally, we propose several research agendas: democratising computational arts, digital privacy, and safety for metaverse artists, ownership recognition for digital artworks, technological challenges, and so on. The survey also serves as introductory material for artists and metaverse technologists to begin creations in the realm of surrealistic cyberspace.

* Submitted to ACM Computing Surveys, 36 pages 

  Access Paper or Ask Questions

A Survey on Neural Speech Synthesis

Jul 23, 2021
Xu Tan, Tao Qin, Frank Soong, Tie-Yan Liu

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural speech given text, is a hot research topic in speech, language, and machine learning communities and has broad applications in the industry. As the development of deep learning and artificial intelligence, neural network-based TTS has significantly improved the quality of synthesized speech in recent years. In this paper, we conduct a comprehensive survey on neural TTS, aiming to provide a good understanding of current research and future trends. We focus on the key components in neural TTS, including text analysis, acoustic models and vocoders, and several advanced topics, including fast TTS, low-resource TTS, robust TTS, expressive TTS, and adaptive TTS, etc. We further summarize resources related to TTS (e.g., datasets, opensource implementations) and discuss future research directions. This survey can serve both academic researchers and industry practitioners working on TTS.

* A comprehensive survey on TTS, 63 pages, 18 tables, 7 figures, 457 references 

  Access Paper or Ask Questions

The Kaleidoscope of Privacy: Differences across French, German, UK, and US GDPR Media Discourse

Mar 31, 2021
Mary Sanford, Taha Yasseri

Conceptions of privacy differ by culture. In the Internet age, digital tools continuously challenge the way users, technologists, and governments define, value, and protect privacy. National and supranational entities attempt to regulate privacy and protect data managed online. The European Union passed the General Data Protection Regulation (GDPR), which took effect on 25 May 2018. The research presented here draws on two years of media reporting on GDPR from French, German, UK, and US sources. We use the unsupervised machine learning method of topic modelling to compare the thematic structure of the news articles across time and geographic regions. Our work emphasises the relevance of regional differences regarding valuations of privacy and potential obstacles to the implementation of unilateral data protection regulation such as GDPR. We find that the topics and trends over time in GDPR media coverage of the four countries reflect the differences found across their traditional privacy cultures.

* Under Review 

  Access Paper or Ask Questions

Shift-of-Perspective Identification Within Legal Cases

Jul 17, 2019
Gathika Ratnayaka, Thejan Rupasinghe, Nisansa de Silva, Viraj Salaka Gamage, Menuka Warushavithana, Amal Shehan Perera

Arguments, counter-arguments, facts, and evidence obtained via documents related to previous court cases are of essential need for legal professionals. Therefore, the process of automatic information extraction from documents containing legal opinions related to court cases can be considered to be of significant importance. This study is focused on the identification of sentences in legal opinion texts which convey different perspectives on a certain topic or entity. We combined several approaches based on semantic analysis, open information extraction, and sentiment analysis to achieve our objective. Then, our methodology was evaluated with the help of human judges. The outcomes of the evaluation demonstrate that our system is successful in detecting situations where two sentences deliver different opinions on the same topic or entity. The proposed methodology can be used to facilitate other information extraction tasks related to the legal domain. One such task is the automated detection of counter arguments for a given argument. Another is the identification of opponent parties in a court case.

  Access Paper or Ask Questions

Quantum Latent Semantic Analysis

Mar 07, 2019
Fabio A. González, Juan C. Caicedo

The main goal of this paper is to explore latent topic analysis (LTA), in the context of quantum information retrieval. LTA is a valuable technique for document analysis and representation, which has been extensively used in information retrieval and machine learning. Different LTA techniques have been proposed, some based on geometrical modeling (such as latent semantic analysis, LSA) and others based on a strong statistical foundation. However, these two different approaches are not usually mixed. Quantum information retrieval has the remarkable virtue of combining both geometry and probability in a common principled framework. We built on this quantum framework to propose a new LTA method, which has a clear geometrical motivation but also supports a well-founded probabilistic interpretation. An initial exploratory experimentation was performed on three standard data sets. The results show that the proposed method outperforms LSA on two of the three datasets. These results suggests that the quantum-motivated representation is an alternative for geometrical latent topic modeling worthy of further exploration.

* ICTIR2011 International Conference on the Theory of Information Retrieval 

  Access Paper or Ask Questions

Exploring the context of recurrent neural network based conversational agents

Jan 31, 2019
Raffaele Piccini, Gerasimos Spanakis

Conversational agents have begun to rise both in the academic (in terms of research) and commercial (in terms of applications) world. This paper investigates the task of building a non-goal driven conversational agent, using neural network generative models and analyzes how the conversation context is handled. It compares a simpler Encoder-Decoder with a Hierarchical Recurrent Encoder-Decoder architecture, which includes an additional module to model the context of the conversation using previous utterances information. We found that the hierarchical model was able to extract relevant context information and include them in the generation of the output. However, it performed worse (35-40%) than the simple Encoder-Decoder model regarding both grammatically correct output and meaningful response. Despite these results, experiments demonstrate how conversations about similar topics appear close to each other in the context space due to the increased frequency of specific topic-related words, thus leaving promising directions for future research and how the context of a conversation can be exploited.

* Accepted at ICAART 2019, 10 pages 

  Access Paper or Ask Questions

Computational Analysis of Insurance Complaints: GEICO Case Study

Jun 26, 2018
Amir Karami, Noelle M. Pendergraft

The online environment has provided a great opportunity for insurance policyholders to share their complaints with respect to different services. These complaints can reveal valuable information for insurance companies who seek to improve their services; however, analyzing a huge number of online complaints is a complicated task for human and must involve computational methods to create an efficient process. This research proposes a computational approach to characterize the major topics of a large number of online complaints. Our approach is based on using the topic modeling approach to disclose the latent semantic of complaints. The proposed approach deployed on thousands of GEICO negative reviews. Analyzing 1,371 GEICO complaints indicates that there are 30 major complains in four categories: (1) customer service, (2) insurance coverage, paperwork, policy, and reports, (3) legal issues, and (4) costs, estimates, and payments. This research approach can be used in other applications to explore a large number of reviews.

  Access Paper or Ask Questions