Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pabitra Mitra

Visual Attention for Behavioral Cloning in Autonomous Driving

Dec 05, 2018

Sourav Pal, Tharun Mohandoss, Pabitra Mitra

Abstract:The goal of our work is to use visual attention to enhance autonomous driving performance. We present two methods of predicting visual attention maps. The first method is a supervised learning approach in which we collect eye-gaze data for the task of driving and use this to train a model for predicting the attention map. The second method is a novel unsupervised approach where we train a model to learn to predict attention as it learns to drive a car. Finally, we present a comparative study of our results and show that the supervised approach for predicting attention when incorporated performs better than other approaches.

* Paper Accepted at ICMV (2018)

Via

Access Paper or Ask Questions

Improving Consistency and Correctness of Sequence Inpainting using Semantically Guided Generative Adversarial Network

Nov 17, 2017

Avisek Lahiri, Arnav Jain, Prabir Kumar Biswas, Pabitra Mitra

Figure 1 for Improving Consistency and Correctness of Sequence Inpainting using Semantically Guided Generative Adversarial Network

Figure 2 for Improving Consistency and Correctness of Sequence Inpainting using Semantically Guided Generative Adversarial Network

Figure 3 for Improving Consistency and Correctness of Sequence Inpainting using Semantically Guided Generative Adversarial Network

Figure 4 for Improving Consistency and Correctness of Sequence Inpainting using Semantically Guided Generative Adversarial Network

Abstract:Contemporary benchmark methods for image inpainting are based on deep generative models and specifically leverage adversarial loss for yielding realistic reconstructions. However, these models cannot be directly applied on image/video sequences because of an intrinsic drawback- the reconstructions might be independently realistic, but, when visualized as a sequence, often lacks fidelity to the original uncorrupted sequence. The fundamental reason is that these methods try to find the best matching latent space representation near to natural image manifold without any explicit distance based loss. In this paper, we present a semantically conditioned Generative Adversarial Network (GAN) for sequence inpainting. The conditional information constrains the GAN to map a latent representation to a point in image manifold respecting the underlying pose and semantics of the scene. To the best of our knowledge, this is the first work which simultaneously addresses consistency and correctness of generative model based inpainting. We show that our generative model learns to disentangle pose and appearance information; this independence is exploited by our model to generate highly consistent reconstructions. The conditional information also aids the generator network in GAN to produce sharper images compared to the original GAN formulation. This helps in achieving more appealing inpainting performance. Though generic, our algorithm was targeted for inpainting on faces. When applied on CelebA and Youtube Faces datasets, the proposed method results in a significant improvement over the current benchmark, both in terms of quantitative evaluation (Peak Signal to Noise Ratio) and human visual scoring over diversified combinations of resolutions and deformations.

Via

Access Paper or Ask Questions

Recurrent Memory Addressing for describing videos

Mar 23, 2017

Arnav Kumar Jain, Abhinav Agarwalla, Kumar Krishna Agrawal, Pabitra Mitra

Figure 1 for Recurrent Memory Addressing for describing videos

Figure 2 for Recurrent Memory Addressing for describing videos

Figure 3 for Recurrent Memory Addressing for describing videos

Figure 4 for Recurrent Memory Addressing for describing videos

Abstract:In this paper, we introduce Key-Value Memory Networks to a multimodal setting and a novel key-addressing mechanism to deal with sequence-to-sequence models. The proposed model naturally decomposes the problem of video captioning into vision and language segments, dealing with them as key-value pairs. More specifically, we learn a semantic embedding (v) corresponding to each frame (k) in the video, thereby creating (k, v) memory slots. We propose to find the next step attention weights conditioned on the previous attention distributions for the key-value memory slots in the memory addressing schema. Exploiting this flexibility of the framework, we additionally capture spatial dependencies while mapping from the visual to semantic embedding. Experiments done on the Youtube2Text dataset demonstrate usefulness of recurrent key-addressing, while achieving competitive scores on BLEU@4, METEOR metrics against state-of-the-art models.

Via

Access Paper or Ask Questions

Visualization Regularizers for Neural Network based Image Recognition

Jan 03, 2017

Biswajit Paria, Vikas Reddy, Anirban Santara, Pabitra Mitra

Figure 1 for Visualization Regularizers for Neural Network based Image Recognition

Figure 2 for Visualization Regularizers for Neural Network based Image Recognition

Figure 3 for Visualization Regularizers for Neural Network based Image Recognition

Figure 4 for Visualization Regularizers for Neural Network based Image Recognition

Abstract:The success of deep neural networks is mostly due their ability to learn meaningful features from the data. Features learned in the hidden layers of deep neural networks trained in computer vision tasks have been shown to be similar to mid-level vision features. We leverage this fact in this work and propose the visualization regularizer for image tasks. The proposed regularization technique enforces smoothness of the features learned by hidden nodes and turns out to be a special case of Tikhonov regularization. We achieve higher classification accuracy as compared to existing regularizers such as the L2 norm regularizer and dropout, on benchmark datasets without changing the training computational complexity.

Via

Access Paper or Ask Questions

BASS Net: Band-Adaptive Spectral-Spatial Feature Learning Neural Network for Hyperspectral Image Classification

Dec 02, 2016

Anirban Santara, Kaustubh Mani, Pranoot Hatwar, Ankit Singh, Ankur Garg, Kirti Padia, Pabitra Mitra

Figure 1 for BASS Net: Band-Adaptive Spectral-Spatial Feature Learning Neural Network for Hyperspectral Image Classification

Figure 2 for BASS Net: Band-Adaptive Spectral-Spatial Feature Learning Neural Network for Hyperspectral Image Classification

Figure 3 for BASS Net: Band-Adaptive Spectral-Spatial Feature Learning Neural Network for Hyperspectral Image Classification

Figure 4 for BASS Net: Band-Adaptive Spectral-Spatial Feature Learning Neural Network for Hyperspectral Image Classification

Abstract:Deep learning based landcover classification algorithms have recently been proposed in literature. In hyperspectral images (HSI) they face the challenges of large dimensionality, spatial variability of spectral signatures and scarcity of labeled data. In this article we propose an end-to-end deep learning architecture that extracts band specific spectral-spatial features and performs landcover classification. The architecture has fewer independent connection weights and thus requires lesser number of training data. The method is found to outperform the highest reported accuracies on popular hyperspectral image data sets.

* 8 pages, 10 figures, Submitted to IEEE TGRS, Code available at: https://github.com/kaustubh0mani/BASS-Net

Via

Access Paper or Ask Questions

WEPSAM: Weakly Pre-Learnt Saliency Model

May 03, 2016

Avisek Lahiri, Sourya Roy, Anirban Santara, Pabitra Mitra, Prabir Kumar Biswas

Figure 1 for WEPSAM: Weakly Pre-Learnt Saliency Model

Figure 2 for WEPSAM: Weakly Pre-Learnt Saliency Model

Figure 3 for WEPSAM: Weakly Pre-Learnt Saliency Model

Figure 4 for WEPSAM: Weakly Pre-Learnt Saliency Model

Abstract:Visual saliency detection tries to mimic human vision psychology which concentrates on sparse, important areas in natural image. Saliency prediction research has been traditionally based on low level features such as contrast, edge, etc. Recent thrust in saliency prediction research is to learn high level semantics using ground truth eye fixation datasets. In this paper we present, WEPSAM : Weakly Pre-Learnt Saliency Model as a pioneering effort of using domain specific pre-learing on ImageNet for saliency prediction using a light weight CNN architecture. The paper proposes a two step hierarchical learning, in which the first step is to develop a framework for weakly pre-training on a large scale dataset such as ImageNet which is void of human eye fixation maps. The second step refines the pre-trained model on a limited set of ground truth fixations. Analysis of loss on iSUN and SALICON datasets reveal that pre-trained network converges much faster compared to randomly initialized network. WEPSAM also outperforms some recent state-of-the-art saliency prediction models on the challenging MIT300 dataset.

Via

Access Paper or Ask Questions

Visual saliency detection: a Kalman filter based approach

Apr 17, 2016

Sourya Roy, Pabitra Mitra

Figure 1 for Visual saliency detection: a Kalman filter based approach

Figure 2 for Visual saliency detection: a Kalman filter based approach

Figure 3 for Visual saliency detection: a Kalman filter based approach

Figure 4 for Visual saliency detection: a Kalman filter based approach

Abstract:In this paper we propose a Kalman filter aided saliency detection model which is based on the conjecture that salient regions are considerably different from our "visual expectation" or they are "visually surprising" in nature. In this work, we have structured our model with an immediate objective to predict saliency in static images. However, the proposed model can be easily extended for space-time saliency prediction. Our approach was evaluated using two publicly available benchmark data sets and results have been compared with other existing saliency models. The results clearly illustrate the superior performance of the proposed model over other approaches.

Via

Access Paper or Ask Questions

Ensemble of Deep Convolutional Neural Networks for Learning to Detect Retinal Vessels in Fundus Images

Mar 15, 2016

Debapriya Maji, Anirban Santara, Pabitra Mitra, Debdoot Sheet

Figure 1 for Ensemble of Deep Convolutional Neural Networks for Learning to Detect Retinal Vessels in Fundus Images

Figure 2 for Ensemble of Deep Convolutional Neural Networks for Learning to Detect Retinal Vessels in Fundus Images

Figure 3 for Ensemble of Deep Convolutional Neural Networks for Learning to Detect Retinal Vessels in Fundus Images

Figure 4 for Ensemble of Deep Convolutional Neural Networks for Learning to Detect Retinal Vessels in Fundus Images

Abstract:Vision impairment due to pathological damage of the retina can largely be prevented through periodic screening using fundus color imaging. However the challenge with large scale screening is the inability to exhaustively detect fine blood vessels crucial to disease diagnosis. In this work we present a computational imaging framework using deep and ensemble learning for reliable detection of blood vessels in fundus color images. An ensemble of deep convolutional neural networks is trained to segment vessel and non-vessel areas of a color fundus image. During inference, the responses of the individual ConvNets of the ensemble are averaged to form the final segmentation. In experimental evaluation with the DRIVE database, we achieve the objective of vessel detection with maximum average accuracy of 94.7\% and area under ROC curve of 0.9283.

Via

Access Paper or Ask Questions

Faster learning of deep stacked autoencoders on multi-core systems using synchronized layer-wise pre-training

Mar 09, 2016

Anirban Santara, Debapriya Maji, DP Tejas, Pabitra Mitra, Arobinda Gupta

Figure 1 for Faster learning of deep stacked autoencoders on multi-core systems using synchronized layer-wise pre-training

Figure 2 for Faster learning of deep stacked autoencoders on multi-core systems using synchronized layer-wise pre-training

Figure 3 for Faster learning of deep stacked autoencoders on multi-core systems using synchronized layer-wise pre-training

Figure 4 for Faster learning of deep stacked autoencoders on multi-core systems using synchronized layer-wise pre-training

Abstract:Deep neural networks are capable of modelling highly non-linear functions by capturing different levels of abstraction of data hierarchically. While training deep networks, first the system is initialized near a good optimum by greedy layer-wise unsupervised pre-training. However, with burgeoning data and increasing dimensions of the architecture, the time complexity of this approach becomes enormous. Also, greedy pre-training of the layers often turns detrimental by over-training a layer causing it to lose harmony with the rest of the network. In this paper a synchronized parallel algorithm for pre-training deep networks on multi-core machines has been proposed. Different layers are trained by parallel threads running on different cores with regular synchronization. Thus the pre-training process becomes faster and chances of over-training are reduced. This is experimentally validated using a stacked autoencoder for dimensionality reduction of MNIST handwritten digit database. The proposed algorithm achieved 26\% speed-up compared to greedy layer-wise pre-training for achieving the same reconstruction accuracy substantiating its potential as an alternative.

Via

Access Paper or Ask Questions

A dense subgraph based algorithm for compact salient image region detection

Dec 19, 2015

Souradeep Chakraborty, Pabitra Mitra

Figure 1 for A dense subgraph based algorithm for compact salient image region detection

Figure 2 for A dense subgraph based algorithm for compact salient image region detection

Figure 3 for A dense subgraph based algorithm for compact salient image region detection

Figure 4 for A dense subgraph based algorithm for compact salient image region detection

Abstract:We present an algorithm for graph based saliency computation that utilizes the underlying dense subgraphs in finding visually salient regions in an image. To compute the salient regions, the model first obtains a saliency map using random walks on a Markov chain. Next, k-dense subgraphs are detected to further enhance the salient regions in the image. Dense subgraphs convey more information about local graph structure than simple centrality measures. To generate the Markov chain, intensity and color features of an image in addition to region compactness is used. For evaluating the proposed model, we do extensive experiments on benchmark image data sets. The proposed method performs comparable to well-known algorithms in salient region detection.

* 33 pages, 18 figures, Single column manuscript pre-print, Accepted at Computer Vision and Image Understanding, Elsevier

Via

Access Paper or Ask Questions