Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Real-Time Text Detection and Recognition

Oct 31, 2020
Shuonan Pei, Mingzhi Zhu

Inrecentyears,ConvolutionalNeuralNet-work(CNN) is quite a popular topic, as it is a powerful andintelligent technique that can be applied in various fields.The YOLO is a technique that uses the algorithms for real-time text detection tasks. However, issues like, photometricdistortion and geometric distortion, could affect the systemYOLO accuracy and cause system failure. Therefore, thereare improvements that can make the system work better. Inthis paper, we are going to present our solution - a potentialsolution of a fast and accurate real-time text direction andrecognition system. The paper covers the topic of Real-TimeText detection and recognition in three major areas: 1. videoand image preprocess, 2. Text detection, 3. Text recognition. Asa mature technique, there are many existing methods that canpotentially improve the solution. We will go through some ofthose existing methods in the literature review session. In thisway, we are presenting an industrial strength, high-accuracy,Real-Time Text Detection and recognition tool.

  Access Paper or Ask Questions

XREF: Entity Linking for Chinese News Comments with Supplementary Article Reference

Jun 24, 2020
Xinyu Hua, Lei Li, Lifeng Hua, Lu Wang

Automatic identification of mentioned entities in social media posts facilitates quick digestion of trending topics and popular opinions. Nonetheless, this remains a challenging task due to limited context and diverse name variations. In this paper, we study the problem of entity linking for Chinese news comments given mentions' spans. We hypothesize that comments often refer to entities in the corresponding news article, as well as topics involving the entities. We therefore propose a novel model, XREF, that leverages attention mechanisms to (1) pinpoint relevant context within comments, and (2) detect supporting entities from the news article. To improve training, we make two contributions: (a) we propose a supervised attention loss in addition to the standard cross entropy, and (b) we develop a weakly supervised training scheme to utilize the large-scale unlabeled corpus. Two new datasets in entertainment and product domains are collected and annotated for experiments. Our proposed method outperforms previous methods on both datasets.

* Accepted to AKBC2020, link to openreview: 

  Access Paper or Ask Questions

Question Answering over Curated and Open Web Sources

Apr 28, 2020
Rishiraj Saha Roy, Avishek Anand

The last few years have seen an explosion of research on the topic of automated question answering (QA), spanning the communities of information retrieval, natural language processing, and artificial intelligence. This tutorial would cover the highlights of this really active period of growth for QA to give the audience a grasp over the families of algorithms that are currently being used. We partition research contributions by the underlying source from where answers are retrieved: curated knowledge graphs, unstructured text, or hybrid corpora. We choose this dimension of partitioning as it is the most discriminative when it comes to algorithm design. Other key dimensions are covered within each sub-topic: like the complexity of questions addressed, and degrees of explainability and interactivity introduced in the systems. We would conclude the tutorial with the most promising emerging trends in the expanse of QA, that would help new entrants into this field make the best decisions to take the community forward. Much has changed in the community since the last tutorial on QA in SIGIR 2016, and we believe that this timely overview will indeed benefit a large number of conference participants.

* SIGIR 2020 Tutorial 

  Access Paper or Ask Questions

Ruuh: A Deep Learning Based Conversational Social Agent

Oct 22, 2018
Sonam Damani, Nitya Raviprakash, Umang Gupta, Ankush Chatterjee, Meghana Joshi, Khyatti Gupta, Kedhar Nath Narahari, Puneet Agrawal, Manoj Kumar Chinnakotla, Sneha Magapu, Abhishek Mathur

Dialogue systems and conversational agents are becoming increasingly popular in the modern society but building an agent capable of holding intelligent conversation with its users is a challenging problem for artificial intelligence. In this demo, we demonstrate a deep learning based conversational social agent called "Ruuh" ( designed by a team at Microsoft India to converse on a wide range of topics. Ruuh needs to think beyond the utilitarian notion of merely generating "relevant" responses and meet a wider range of user social needs, like expressing happiness when user's favorite team wins, sharing a cute comment on showing the pictures of the user's pet and so on. The agent also needs to detect and respond to abusive language, sensitive topics and trolling behavior of the users. Many of these problems pose significant research challenges which will be demonstrated in our demo. Our agent has interacted with over 2 million real world users till date which has generated over 150 million user conversations.

* 2 pages, 1 figure 

  Access Paper or Ask Questions

Aspect Sentiment Model for Micro Reviews

Jun 14, 2018
Reinald Kim Amplayo, Seung-won Hwang

This paper aims at an aspect sentiment model for aspect-based sentiment analysis (ABSA) focused on micro reviews. This task is important in order to understand short reviews majority of the users write, while existing topic models are targeted for expert-level long reviews with sufficient co-occurrence patterns to observe. Current methods on aggregating micro reviews using metadata information may not be effective as well due to metadata absence, topical heterogeneity, and cold start problems. To this end, we propose a model called Micro Aspect Sentiment Model (MicroASM). MicroASM is based on the observation that short reviews 1) are viewed with sentiment-aspect word pairs as building blocks of information, and 2) can be clustered into larger reviews. When compared to the current state-of-the-art aspect sentiment models, experiments show that our model provides better performance on aspect-level tasks such as aspect term extraction and document-level tasks such as sentiment classification.

* Data Mining (ICDM), 2017 IEEE International Conference on 
* ICDM 2017 

  Access Paper or Ask Questions

Crowded Scene Analysis: A Survey

Feb 06, 2015
Teng Li, Huan Chang, Meng Wang, Bingbing Ni, Richang Hong, Shuicheng Yan

Automated scene analysis has been a topic of great interest in computer vision and cognitive science. Recently, with the growth of crowd phenomena in the real world, crowded scene analysis has attracted much attention. However, the visual occlusions and ambiguities in crowded scenes, as well as the complex behaviors and scene semantics, make the analysis a challenging task. In the past few years, an increasing number of works on crowded scene analysis have been reported, covering different aspects including crowd motion pattern learning, crowd behavior and activity analysis, and anomaly detection in crowds. This paper surveys the state-of-the-art techniques on this topic. We first provide the background knowledge and the available features related to crowded scenes. Then, existing models, popular algorithms, evaluation protocols, as well as system performance are provided corresponding to different aspects of crowded scene analysis. We also outline the available datasets for performance evaluation. Finally, some research problems and promising future directions are presented with discussions.

* 20 pages in IEEE Transactions on Circuits and Systems for Video Technology, 2015 

  Access Paper or Ask Questions

EvaLDA: Efficient Evasion Attacks Towards Latent Dirichlet Allocation

Dec 09, 2020
Qi Zhou, Haipeng Chen, Yitao Zheng, Zhen Wang

As one of the most powerful topic models, Latent Dirichlet Allocation (LDA) has been used in a vast range of tasks, including document understanding, information retrieval and peer-reviewer assignment. Despite its tremendous popularity, the security of LDA has rarely been studied. This poses severe risks to security-critical tasks such as sentiment analysis and peer-reviewer assignment that are based on LDA. In this paper, we are interested in knowing whether LDA models are vulnerable to adversarial perturbations of benign document examples during inference time. We formalize the evasion attack to LDA models as an optimization problem and prove it to be NP-hard. We then propose a novel and efficient algorithm, EvaLDA to solve it. We show the effectiveness of EvaLDA via extensive empirical evaluations. For instance, in the NIPS dataset, EvaLDA can averagely promote the rank of a target topic from 10 to around 7 by only replacing 1% of the words with similar words in a victim document. Our work provides significant insights into the power and limitations of evasion attacks to LDA models.

* Accepted for publication at AAAI'2021. 10 pages 

  Access Paper or Ask Questions

Asymmetrical Hierarchical Networks with Attentive Interactions for Interpretable Review-Based Recommendation

Dec 18, 2019
Xin Dong, Jingchao Ni, Wei Cheng, Zhengzhang Chen, Bo Zong, Dongjin Song, Yanchi Liu, Haifeng Chen, Gerard de Melo

Recently, recommender systems have been able to emit substantially improved recommendations by leveraging user-provided reviews. Existing methods typically merge all reviews of a given user or item into a long document, and then process user and item documents in the same manner. In practice, however, these two sets of reviews are notably different: users' reviews reflect a variety of items that they have bought and are hence very heterogeneous in their topics, while an item's reviews pertain only to that single item and are thus topically homogeneous. In this work, we develop a novel neural network model that properly accounts for this important difference by means of asymmetric attentive modules. The user module learns to attend to only those signals that are relevant with respect to the target item, whereas the item module learns to extract the most salient contents with regard to properties of the item. Our multi-hierarchical paradigm accounts for the fact that neither are all reviews equally useful, nor are all sentences within each review equally pertinent. Extensive experimental results on a variety of real datasets demonstrate the effectiveness of our method.

* AAAI 2020 

  Access Paper or Ask Questions

Clinically Accurate Chest X-Ray Report Generation

Apr 04, 2019
Guanxiong Liu, Tzu-Ming Harry Hsu, Matthew McDermott, Willie Boag, Wei-Hung Weng, Peter Szolovits, Marzyeh Ghassemi

The automatic generation of radiology reports given medical radiographs has significant potential to operationally and clinically improve patient care. A number of prior works have focused on this problem, employing advanced methods from computer vision and natural language generation to produce readable reports. However, these works often fail to account for the particular nuances of the radiology domain, and, in particular, the critical importance of clinical accuracy in the resulting generated reports. In this work, we present a domain-aware automatic chest X-Ray radiology report generation system which first predicts what topics will be discussed in the report, then conditionally generates sentences corresponding to these topics. The resulting system is fine-tuned using reinforcement learning, considering both readability and clinical accuracy, as assessed by the proposed Clinically Coherent Reward. We verify this system on two datasets, Open-I and MIMIC-CXR, and demonstrate that our model offers marked improvements on both language generation metrics and CheXpert assessed accuracy over a variety of competitive baselines.

  Access Paper or Ask Questions

Open Source Automatic Speech Recognition for German

Jul 26, 2018
Benjamin Milde, Arne Köhn

High quality Automatic Speech Recognition (ASR) is a prerequisite for speech-based applications and research. While state-of-the-art ASR software is freely available, the language dependent acoustic models are lacking for languages other than English, due to the limited amount of freely available training data. We train acoustic models for German with Kaldi on two datasets, which are both distributed under a Creative Commons license. The resulting model is freely redistributable, lowering the cost of entry for German ASR. The models are trained on a total of 412 hours of German read speech data and we achieve a relative word error reduction of 26% by adding data from the Spoken Wikipedia Corpus to the previously best freely available German acoustic model recipe and dataset. Our best model achieves a word error rate of 14.38 on the Tuda-De test set. Due to the large amount of speakers and the diversity of topics included in the training data, our model is robust against speaker variation and topic shift.

* Accepted at ITG 2018 

  Access Paper or Ask Questions