Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Information": models, code, and papers

CRRN: Multi-Scale Guided Concurrent Reflection Removal Network

May 30, 2018
Renjie Wan, Boxin Shi, Ling-Yu Duan, Ah-Hwee Tan, Alex C. Kot

Removing the undesired reflections from images taken through the glass is of broad application to various computer vision tasks. Non-learning based methods utilize different handcrafted priors such as the separable sparse gradients caused by different levels of blurs, which often fail due to their limited description capability to the properties of real-world reflections. In this paper, we propose the Concurrent Reflection Removal Network (CRRN) to tackle this problem in a unified framework. Our proposed network integrates image appearance information and multi-scale gradient information with human perception inspired loss function, and is trained on a new dataset with 3250 reflection images taken under diverse real-world scenes. Extensive experiments on a public benchmark dataset show that the proposed method performs favorably against state-of-the-art methods.

* Accepted by CVPR 2018 

  Access Paper or Ask Questions

Hierarchical Reinforcement Learning with Deep Nested Agents

May 18, 2018
Marc Brittain, Peng Wei

Deep hierarchical reinforcement learning has gained a lot of attention in recent years due to its ability to produce state-of-the-art results in challenging environments where non-hierarchical frameworks fail to learn useful policies. However, as problem domains become more complex, deep hierarchical reinforcement learning can become inefficient, leading to longer convergence times and poor performance. We introduce the Deep Nested Agent framework, which is a variant of deep hierarchical reinforcement learning where information from the main agent is propagated to the low level $nested$ agent by incorporating this information into the nested agent's state. We demonstrate the effectiveness and performance of the Deep Nested Agent framework by applying it to three scenarios in Minecraft with comparisons to a deep non-hierarchical single agent framework, as well as, a deep hierarchical framework.

* 11 pages 

  Access Paper or Ask Questions

Tell Me Why Is It So? Explaining Knowledge Graph Relationships by Finding Descriptive Support Passages

Mar 17, 2018
Sumit Bhatia, Purusharth Dwivedi, Avneet Kaur

We address the problem of finding descriptive explanations of facts stored in a knowledge graph. This is important in high-risk domains such as healthcare, intelligence, etc. where users need additional information for decision making and is especially crucial for applications that rely on automatically constructed knowledge bases where machine learned systems extract facts from an input corpus and working of the extractors is opaque to the end-user. We follow an approach inspired from information retrieval and propose a simple and efficient, yet effective solution that takes into account passage level as well as document level properties to produce a ranked list of passages describing a given input relation. We test our approach using Wikidata as the knowledge base and Wikipedia as the source corpus and report results of user studies conducted to study the effectiveness of our proposed model.

* 12 pages 

  Access Paper or Ask Questions

Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition

Mar 07, 2018
Wei-Ning Hsu, James Glass

The performance of automatic speech recognition (ASR) systems can be significantly compromised by previously unseen conditions, which is typically due to a mismatch between training and testing distributions. In this paper, we address robustness by studying domain invariant features, such that domain information becomes transparent to ASR systems, resolving the mismatch problem. Specifically, we investigate a recent model, called the Factorized Hierarchical Variational Autoencoder (FHVAE). FHVAEs learn to factorize sequence-level and segment-level attributes into different latent variables without supervision. We argue that the set of latent variables that contain segment-level information is our desired domain invariant feature for ASR. Experiments are conducted on Aurora-4 and CHiME-4, which demonstrate 41% and 27% absolute word error rate reductions respectively on mismatched domains.

* accepted by 2018 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2018) 

  Access Paper or Ask Questions

Deep Multimodal Learning for Emotion Recognition in Spoken Language

Feb 22, 2018
Yue Gu, Shuhong Chen, Ivan Marsic

In this paper, we present a novel deep multimodal framework to predict human emotions based on sentence-level spoken language. Our architecture has two distinctive characteristics. First, it extracts the high-level features from both text and audio via a hybrid deep multimodal structure, which considers the spatial information from text, temporal information from audio, and high-level associations from low-level handcrafted features. Second, we fuse all features by using a three-layer deep neural network to learn the correlations across modalities and train the feature extraction and fusion modules together, allowing optimal global fine-tuning of the entire structure. We evaluated the proposed framework on the IEMOCAP dataset. Our result shows promising performance, achieving 60.4% in weighted accuracy for five emotion categories.

* ICASSP 2018 

  Access Paper or Ask Questions

Asking the Difficult Questions: Goal-Oriented Visual Question Generation via Intermediate Rewards

Nov 21, 2017
Junjie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu, Anton van den Hengel

Despite significant progress in a variety of vision-and-language problems, developing a method capable of asking intelligent, goal-oriented questions about images is proven to be an inscrutable challenge. Towards this end, we propose a Deep Reinforcement Learning framework based on three new intermediate rewards, namely goal-achieved, progressive and informativeness that encourage the generation of succinct questions, which in turn uncover valuable information towards the overall goal. By directly optimizing for questions that work quickly towards fulfilling the overall goal, we avoid the tendency of existing methods to generate long series of insane queries that add little value. We evaluate our model on the GuessWhat?! dataset and show that the resulting questions can help a standard Guesser identify a specific object in an image at a much higher success rate.

  Access Paper or Ask Questions

Lens depth function and k-relative neighborhood graph: versatile tools for ordinal data analysis

Jul 24, 2017
Matthäus Kleindessner, Ulrike von Luxburg

In recent years it has become popular to study machine learning problems in a setting of ordinal distance information rather than numerical distance measurements. By ordinal distance information we refer to binary answers to distance comparisons such as $d(A,B)

* Journal of Machine Learning Research 18(58):1-52, 2017 

  Access Paper or Ask Questions

Automated Experiment Design for Data-Efficient Verification of Parametric Markov Decision Processes

Jul 05, 2017
Elizabeth Polgreen, Viraj Wijesuriya, Sofie Haesaert, Alessandro Abate

We present a new method for statistical verification of quantitative properties over a partially unknown system with actions, utilising a parameterised model (in this work, a parametric Markov decision process) and data collected from experiments performed on the underlying system. We obtain the confidence that the underlying system satisfies a given property, and show that the method uses data efficiently and thus is robust to the amount of data available. These characteristics are achieved by firstly exploiting parameter synthesis to establish a feasible set of parameters for which the underlying system will satisfy the property; secondly, by actively synthesising experiments to increase amount of information in the collected data that is relevant to the property; and finally propagating this information over the model parameters, obtaining a confidence that reflects our belief whether or not the system parameters lie in the feasible set, thereby solving the verification problem.

* QEST 2017, 18 pages, 7 figures 

  Access Paper or Ask Questions

Multiple VLAD encoding of CNNs for image classification

Jun 30, 2017
Qing Li, Qiang Peng, Chuan Yan

Despite the effectiveness of convolutional neural networks (CNNs) especially in image classification tasks, the effect of convolution features on learned representations is still limited. It mostly focuses on the salient object of the images, but ignores the variation information on clutter and local. In this paper, we propose a special framework, which is the multiple VLAD encoding method with the CNNs features for image classification. Furthermore, in order to improve the performance of the VLAD coding method, we explore the multiplicity of VLAD encoding with the extension of three kinds of encoding algorithms, which are the VLAD-SA method, the VLAD-LSA and the VLAD-LLC method. Finally, we equip the spatial pyramid patch (SPM) on VLAD encoding to add the spatial information of CNNs feature. In particular, the power of SPM leads our framework to yield better performance compared to the existing method.

  Access Paper or Ask Questions

Recognizing Activities of Daily Living from Egocentric Images

Apr 13, 2017
Alejandro Cartas, Juan Marín, Petia Radeva, Mariella Dimiccoli

Recognizing Activities of Daily Living (ADLs) has a large number of health applications, such as characterize lifestyle for habit improvement, nursing and rehabilitation services. Wearable cameras can daily gather large amounts of image data that provide rich visual information about ADLs than using other wearable sensors. In this paper, we explore the classification of ADLs from images captured by low temporal resolution wearable camera (2fpm) by using a Convolutional Neural Networks (CNN) approach. We show that the classification accuracy of a CNN largely improves when its output is combined, through a random decision forest, with contextual information from a fully connected layer. The proposed method was tested on a subset of the NTCIR-12 egocentric dataset, consisting of 18,674 images and achieved an overall accuracy of 86% activity recognition on 21 classes.

* To appear in the Proceedings of IbPRIA 2017 

  Access Paper or Ask Questions