Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Brejesh Lall

Image fusion using symmetric skip autoencodervia an Adversarial Regulariser

Jun 04, 2020

Snigdha Bhagat, S. D. Joshi, Brejesh Lall

Figure 1 for Image fusion using symmetric skip autoencodervia an Adversarial Regulariser

Figure 2 for Image fusion using symmetric skip autoencodervia an Adversarial Regulariser

Figure 3 for Image fusion using symmetric skip autoencodervia an Adversarial Regulariser

Figure 4 for Image fusion using symmetric skip autoencodervia an Adversarial Regulariser

Abstract:It is a challenging task to extract the best of both worlds by combining the spatial characteristics of a visible image and the spectral content of an infrared image. In this work, we propose a spatially constrained adversarial autoencoder that extracts deep features from the infrared and visible images to obtain a more exhaustive and global representation. In this paper, we propose a residual autoencoder architecture, regularised by a residual adversarial network, to generate a more realistic fused image. The residual module serves as primary building for the encoder, decoder and adversarial network, as an add on the symmetric skip connections perform the functionality of embedding the spatial characteristics directly from the initial layers of encoder structure to the decoder part of the network. The spectral information in the infrared image is incorporated by adding the feature maps over several layers in the encoder part of the fusion structure, which makes inference on both the visual and infrared images separately. In order to efficiently optimize the parameters of the network, we propose an adversarial regulariser network which would perform supervised learning on the fused image and the original visual image.

Via

Access Paper or Ask Questions

Deep feature fusion for self-supervised monocular depth prediction

May 16, 2020

Vinay Kaushik, Brejesh Lall

Figure 1 for Deep feature fusion for self-supervised monocular depth prediction

Figure 2 for Deep feature fusion for self-supervised monocular depth prediction

Figure 3 for Deep feature fusion for self-supervised monocular depth prediction

Figure 4 for Deep feature fusion for self-supervised monocular depth prediction

Abstract:Recent advances in end-to-end unsupervised learning has significantly improved the performance of monocular depth prediction and alleviated the requirement of ground truth depth. Although a plethora of work has been done in enforcing various structural constraints by incorporating multiple losses utilising smoothness, left-right consistency, regularisation and matching surface normals, a few of them take into consideration multi-scale structures present in real world images. Most works utilise a VGG16 or ResNet50 model pre-trained on ImageNet weights for predicting depth. We propose a deep feature fusion method utilising features at multiple scales for learning self-supervised depth from scratch. Our fusion network selects features from both upper and lower levels at every level in the encoder network, thereby creating multiple feature pyramid sub-networks that are fed to the decoder after applying the CoordConv solution. We also propose a refinement module learning higher scale residual depth from a combination of higher level deep features and lower level residual depth using a pixel shuffling framework that super-resolves lower level residual depth. We select the KITTI dataset for evaluation and show that our proposed architecture can produce better or comparable results in depth prediction.

* 4 pages, 2 Tables, 2 Figures

Via

Access Paper or Ask Questions

Compressive sensing based privacy for fall detection

Jan 10, 2020

Ronak Gupta, Prashant Anand, Santanu Chaudhury, Brejesh Lall, Sanjay Singh

Figure 1 for Compressive sensing based privacy for fall detection

Figure 2 for Compressive sensing based privacy for fall detection

Figure 3 for Compressive sensing based privacy for fall detection

Figure 4 for Compressive sensing based privacy for fall detection

Abstract:Fall detection holds immense importance in the field of healthcare, where timely detection allows for instant medical assistance. In this context, we propose a 3D ConvNet architecture which consists of 3D Inception modules for fall detection. The proposed architecture is a custom version of Inflated 3D (I3D) architecture, that takes compressed measurements of video sequence as spatio-temporal input, obtained from compressive sensing framework, rather than video sequence as input, as in the case of I3D convolutional neural network. This is adopted since privacy raises a huge concern for patients being monitored through these RGB cameras. The proposed framework for fall detection is flexible enough with respect to a wide variety of measurement matrices. Ten action classes randomly selected from Kinetics-400 with no fall examples, are employed to train our 3D ConvNet post compressive sensing with different types of sensing matrices on the original video clips. Our results show that 3D ConvNet performance remains unchanged with different sensing matrices. Also, the performance obtained with Kinetics pre-trained 3D ConvNet on compressively sensed fall videos from benchmark datasets is better than the state-of-the-art techniques.

* accepted in NCVPRIPG 2019

Via

Access Paper or Ask Questions

Aerial multi-object tracking by detection using deep association networks

Sep 04, 2019

Ajit Jadhav, Prerana Mukherjee, Vinay Kaushik, Brejesh Lall

Figure 1 for Aerial multi-object tracking by detection using deep association networks

Figure 2 for Aerial multi-object tracking by detection using deep association networks

Figure 3 for Aerial multi-object tracking by detection using deep association networks

Figure 4 for Aerial multi-object tracking by detection using deep association networks

Abstract:A lot a research is focused on object detection and it has achieved significant advances with deep learning techniques in recent years. Inspite of the existing research, these algorithms are not usually optimal for dealing with sequences or images captured by drone-based platforms, due to various challenges such as view point change, scales, density of object distribution and occlusion. In this paper, we develop a model for detection of objects in drone images using the VisDrone2019 DET dataset. Using the RetinaNet model as our base, we modify the anchor scales to better handle the detection of dense distribution and small size of the objects. We explicitly model the channel interdependencies by using "Squeeze-and-Excitation" (SE) blocks that adaptively recalibrates channel-wise feature responses. This helps to bring significant improvements in performance at a slight additional computational cost. Using this architecture for object detection, we build a custom DeepSORT network for object detection on the VisDrone2019 MOT dataset by training a custom Deep Association network for the algorithm.

Via

Access Paper or Ask Questions

Learning Activation Functions: A new paradigm for understanding Neural Networks

Jul 08, 2019

Mohit Goyal, Rajan Goyal, Brejesh Lall

Figure 1 for Learning Activation Functions: A new paradigm for understanding Neural Networks

Figure 2 for Learning Activation Functions: A new paradigm for understanding Neural Networks

Figure 3 for Learning Activation Functions: A new paradigm for understanding Neural Networks

Figure 4 for Learning Activation Functions: A new paradigm for understanding Neural Networks

Abstract:The scope of research in the domain of activation functions remains limited and centered around improving the ease of optimization or generalization quality of neural networks (NNs). However, to develop a deeper understanding of deep learning, it becomes important to look at the non linear component of NNs more carefully. In this paper, we aim to provide a generic form of activation function along with appropriate mathematical grounding so as to allow for insights into the working of NNs in future. We propose ``Self-Learnable Activation Functions'' (SLAF), which are learned during training and are capable of approximating most of the existing activation functions. SLAF is given as a weighted sum of pre-defined basis elements which can serve for a good approximation of the optimal activation function. The coefficients for these basis elements allow a search in the entire space of continuous functions (consisting of all the conventional activations). We propose various training routines which can be used to achieve performance with SLAF equipped neural networks (SLNNs). We prove that SLNNs can approximate any neural network with lipschitz continuous activations, to any arbitrary error highlighting their capacity and possible equivalence with standard NNs. Also, SLNNs can be completely represented as a collections of finite degree polynomial upto the very last layer obviating several hyper parameters like width and depth. Since the optimization of SLNNs is still a challenge, we show that using SLAF along with standard activations (like ReLU) can provide performance improvements with only a small increase in number of parameters.

* Article submitted to the Asian Conference of Machine Learning (ACML 2019)

Via

Access Paper or Ask Questions

Few Shot Speaker Recognition using Deep Neural Networks

Apr 17, 2019

Prashant Anand, Ajeet Kumar Singh, Siddharth Srivastava, Brejesh Lall

Figure 1 for Few Shot Speaker Recognition using Deep Neural Networks

Figure 2 for Few Shot Speaker Recognition using Deep Neural Networks

Figure 3 for Few Shot Speaker Recognition using Deep Neural Networks

Figure 4 for Few Shot Speaker Recognition using Deep Neural Networks

Abstract:The recent advances in deep learning are mostly driven by availability of large amount of training data. However, availability of such data is not always possible for specific tasks such as speaker recognition where collection of large amount of data is not possible in practical scenarios. Therefore, in this paper, we propose to identify speakers by learning from only a few training examples. To achieve this, we use a deep neural network with prototypical loss where the input to the network is a spectrogram. For output, we project the class feature vectors into a common embedding space, followed by classification. Further, we show the effectiveness of capsule net in a few shot learning setting. To this end, we utilize an auto-encoder to learn generalized feature embeddings from class-specific embeddings obtained from capsule network. We provide exhaustive experiments on publicly available datasets and competitive baselines, demonstrating the superiority and generalization ability of the proposed few shot learning pipelines.

Via

Access Paper or Ask Questions

VayuAnukulani: Adaptive Memory Networks for Air Pollution Forecasting

Apr 08, 2019

Divyam Madaan, Radhika Dua, Prerana Mukherjee, Brejesh Lall

Figure 1 for VayuAnukulani: Adaptive Memory Networks for Air Pollution Forecasting

Figure 2 for VayuAnukulani: Adaptive Memory Networks for Air Pollution Forecasting

Figure 3 for VayuAnukulani: Adaptive Memory Networks for Air Pollution Forecasting

Figure 4 for VayuAnukulani: Adaptive Memory Networks for Air Pollution Forecasting

Abstract:Air pollution is the leading environmental health hazard globally due to various sources which include factory emissions, car exhaust and cooking stoves. As a precautionary measure, air pollution forecast serves as the basis for taking effective pollution control measures, and accurate air pollution forecasting has become an important task. In this paper, we forecast fine-grained ambient air quality information for 5 prominent locations in Delhi based on the historical and real-time ambient air quality and meteorological data reported by Central Pollution Control board. We present VayuAnukulani system, a novel end-to-end solution to predict air quality for next 24 hours by estimating the concentration and level of different air pollutants including nitrogen dioxide ($NO_2$), particulate matter ($PM_{2.5}$ and $PM_{10}$) for Delhi. Extensive experiments on data sources obtained in Delhi demonstrate that the proposed adaptive attention based Bidirectional LSTM Network outperforms several baselines for classification and regression models. The accuracy of the proposed adaptive system is $\sim 15 - 20\%$ better than the same offline trained model. We compare the proposed methodology on several competing baselines, and show that the network outperforms conventional methods by $\sim 3 - 5 \%$.

Via

Access Paper or Ask Questions

DSAL-GAN: Denoising based Saliency Prediction with Generative Adversarial Networks

Apr 02, 2019

Prerana Mukherjee, Manoj Sharma, Megh Makwana, Ajay Pratap Singh, Avinash Upadhyay, Akkshita Trivedi, Brejesh Lall, Santanu Chaudhury

Figure 1 for DSAL-GAN: Denoising based Saliency Prediction with Generative Adversarial Networks

Figure 2 for DSAL-GAN: Denoising based Saliency Prediction with Generative Adversarial Networks

Figure 3 for DSAL-GAN: Denoising based Saliency Prediction with Generative Adversarial Networks

Figure 4 for DSAL-GAN: Denoising based Saliency Prediction with Generative Adversarial Networks

Abstract:Synthesizing high quality saliency maps from noisy images is a challenging problem in computer vision and has many practical applications. Samples generated by existing techniques for saliency detection cannot handle the noise perturbations smoothly and fail to delineate the salient objects present in the given scene. In this paper, we present a novel end-to-end coupled Denoising based Saliency Prediction with Generative Adversarial Network (DSAL-GAN) framework to address the problem of salient object detection in noisy images. DSAL-GAN consists of two generative adversarial-networks (GAN) trained end-to-end to perform denoising and saliency prediction altogether in a holistic manner. The first GAN consists of a generator which denoises the noisy input image, and in the discriminator counterpart we check whether the output is a denoised image or ground truth original image. The second GAN predicts the saliency maps from raw pixels of the input denoised image using a data-driven metric based on saliency prediction method with adversarial loss. Cycle consistency loss is also incorporated to further improve salient region prediction. We demonstrate with comprehensive evaluation that the proposed framework outperforms several baseline saliency models on various performance benchmarks.

Via

Access Paper or Ask Questions

DeepPoint3D: Learning Discriminative Local Descriptors using Deep Metric Learning on 3D Point Clouds

Mar 27, 2019

Siddharth Srivastava, Brejesh Lall

Figure 1 for DeepPoint3D: Learning Discriminative Local Descriptors using Deep Metric Learning on 3D Point Clouds

Figure 2 for DeepPoint3D: Learning Discriminative Local Descriptors using Deep Metric Learning on 3D Point Clouds

Figure 3 for DeepPoint3D: Learning Discriminative Local Descriptors using Deep Metric Learning on 3D Point Clouds

Figure 4 for DeepPoint3D: Learning Discriminative Local Descriptors using Deep Metric Learning on 3D Point Clouds

Abstract:Learning local descriptors is an important problem in computer vision. While there are many techniques for learning local patch descriptors for 2D images, recently efforts have been made for learning local descriptors for 3D points. The recent progress towards solving this problem in 3D leverages the strong feature representation capability of image based convolutional neural networks by utilizing RGB-D or multi-view representations. However, in this paper, we propose to learn 3D local descriptors by directly processing unstructured 3D point clouds without needing any intermediate representation. The method constitutes a deep network for learning permutation invariant representation of 3D points. To learn the local descriptors, we use a multi-margin contrastive loss which discriminates between similar and dissimilar points on a surface while also leveraging the extent of dissimilarity among the negative samples at the time of training. With comprehensive evaluation against strong baselines, we show that the proposed method outperforms state-of-the-art methods for matching points in 3D point clouds. Further, we demonstrate the effectiveness of the proposed method on various applications achieving state-of-the-art results.

Via

Access Paper or Ask Questions

Fast Hierarchical Depth Map Computation from Stereo

Jan 28, 2019

Vinay Kaushik, Brejesh Lall

Figure 1 for Fast Hierarchical Depth Map Computation from Stereo

Figure 2 for Fast Hierarchical Depth Map Computation from Stereo

Figure 3 for Fast Hierarchical Depth Map Computation from Stereo

Figure 4 for Fast Hierarchical Depth Map Computation from Stereo

Abstract:Disparity by Block Matching stereo is usually used in applications with limited computational power in order to get depth estimates. However, the research on simple stereo methods has been lesser than the energy based counterparts which promise a better quality depth map with more potential for future improvements. Semi-global-matching (SGM) methods offer good performance and easy implementation but suffer from the problem of very high memory footprint because it's working on the full disparity space image. On the other hand, Block matching stereo needs much less memory. In this paper, we introduce a novel multi-scale-hierarchical block-matching approach using a pyramidal variant of depth and cost functions which drastically improves the results of standard block matching stereo techniques while preserving the low memory footprint and further reducing the complexity of standard block matching. We tested our new multi block matching scheme on the Middlebury stereo benchmark. For the Middlebury benchmark we get results that are only slightly worse than state of the art SGM implementations.

* Submitted to International Conference on Pattern Recognition and Artificial Intelligence, 2018

Via

Access Paper or Ask Questions