Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xavier Giro-i-Nieto

Recurrent Neural Networks for Semantic Instance Segmentation

Sep 03, 2018
Amaia Salvador, Miriam Bellver, Victor Campos, Manel Baradad, Ferran Marques, Jordi Torres, Xavier Giro-i-Nieto

Figure 1 for Recurrent Neural Networks for Semantic Instance Segmentation

Figure 2 for Recurrent Neural Networks for Semantic Instance Segmentation

We present a recurrent model for semantic instance segmentation that sequentially generates binary masks and their associated class probabilities for every object in an image. Our proposed system is trainable end-to-end from an input image to a sequence of labeled masks and, compared to methods relying on object proposals, does not require post-processing steps on its output. We study the suitability of our recurrent model on three different instance segmentation benchmarks, namely Pascal VOC 2012, CVPPP Plant Leaf Segmentation and Cityscapes. Further, we analyze the object sorting patterns generated by our model and observe that it learns to follow a consistent pattern, which correlates with the activations learned in the encoder part of our network. Source code and models are available at https://imatge-upc.github.io/rsis/

* Extended abstract of this work was presented at CVPR 2018 DeepVision Workshop

Via

Access Paper or Ask Questions

PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks

Sep 03, 2018
Marc Assens, Xavier Giro-i-Nieto, Kevin McGuinness, Noel E. O'Connor

Figure 1 for PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks

Figure 2 for PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks

Figure 3 for PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks

Figure 4 for PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks

We introduce PathGAN, a deep neural network for visual scanpath prediction trained on adversarial examples. A visual scanpath is defined as the sequence of fixation points over an image defined by a human observer with its gaze. PathGAN is composed of two parts, the generator and the discriminator. Both parts extract features from images using off-the-shelf networks, and train recurrent layers to generate or discriminate scanpaths accordingly. In scanpath prediction, the stochastic nature of the data makes it very difficult to generate realistic predictions using supervised learning strategies, but we adopt adversarial training as a suitable alternative. Our experiments prove how PathGAN improves the state of the art of visual scanpath prediction on the iSUN and Salient360! datasets. Source code and models are available at https://imatge-upc.github.io/pathgan/

* ECCV 2018 Workshop on Egocentric Perception, Interaction and Computing (EPIC). This work obtained the 2nd award in Prediction of Head-gaze Scan-paths for Images, and the 2nd award in Prediction of Eye-gaze Scan-paths for Images at the IEEE ICME 2018 Salient360! Challenge

Via

Access Paper or Ask Questions

Online Detection of Action Start in Untrimmed, Streaming Videos

Jul 23, 2018
Zheng Shou, Junting Pan, Jonathan Chan, Kazuyuki Miyazawa, Hassan Mansour, Anthony Vetro, Xavier Giro-i-Nieto, Shih-Fu Chang

Figure 1 for Online Detection of Action Start in Untrimmed, Streaming Videos

Figure 2 for Online Detection of Action Start in Untrimmed, Streaming Videos

Figure 3 for Online Detection of Action Start in Untrimmed, Streaming Videos

Figure 4 for Online Detection of Action Start in Untrimmed, Streaming Videos

We aim to tackle a novel task in action detection - Online Detection of Action Start (ODAS) in untrimmed, streaming videos. The goal of ODAS is to detect the start of an action instance, with high categorization accuracy and low detection latency. ODAS is important in many applications such as early alert generation to allow timely security or emergency response. We propose three novel methods to specifically address the challenges in training ODAS models: (1) hard negative samples generation based on Generative Adversarial Network (GAN) to distinguish ambiguous background, (2) explicitly modeling the temporal consistency between data around action start and data succeeding action start, and (3) adaptive sampling strategy to handle the scarcity of training data. We conduct extensive experiments using THUMOS'14 and ActivityNet. We show that our proposed methods lead to significant performance gains and improve the state-of-the-art methods. An ablation study confirms the effectiveness of each proposed method.

* Accepted by ECCV'18

Via

Access Paper or Ask Questions

SalGAN: Visual Saliency Prediction with Generative Adversarial Networks

Jul 01, 2018
Junting Pan, Cristian Canton Ferrer, Kevin McGuinness, Noel E. O'Connor, Jordi Torres, Elisa Sayrol, Xavier Giro-i-Nieto

Figure 1 for SalGAN: Visual Saliency Prediction with Generative Adversarial Networks

Figure 2 for SalGAN: Visual Saliency Prediction with Generative Adversarial Networks

Figure 3 for SalGAN: Visual Saliency Prediction with Generative Adversarial Networks

Figure 4 for SalGAN: Visual Saliency Prediction with Generative Adversarial Networks

We introduce SalGAN, a deep convolutional neural network for visual saliency prediction trained with adversarial examples. The first stage of the network consists of a generator model whose weights are learned by back-propagation computed from a binary cross entropy (BCE) loss over downsampled versions of the saliency maps. The resulting prediction is processed by a discriminator network trained to solve a binary classification task between the saliency maps generated by the generative stage and the ground truth ones. Our experiments show how adversarial training allows reaching state-of-the-art performance across different metrics when combined with a widely-used loss function like BCE. Our results can be reproduced with the source code and trained models available at https://imatge-upc.github.io/saliency-salgan-2017/.

* Submitted for review to Computer Vision and Image Understanding (CVIU)

Via

Access Paper or Ask Questions

Comparing Fixed and Adaptive Computation Time for Recurrent Neural Networks

Mar 21, 2018
Daniel Fojo, Víctor Campos, Xavier Giro-i-Nieto

Figure 1 for Comparing Fixed and Adaptive Computation Time for Recurrent Neural Networks

Figure 2 for Comparing Fixed and Adaptive Computation Time for Recurrent Neural Networks

Figure 3 for Comparing Fixed and Adaptive Computation Time for Recurrent Neural Networks

Figure 4 for Comparing Fixed and Adaptive Computation Time for Recurrent Neural Networks

Adaptive Computation Time for Recurrent Neural Networks (ACT) is one of the most promising architectures for variable computation. ACT adapts to the input sequence by being able to look at each sample more than once, and learn how many times it should do it. In this paper, we compare ACT to Repeat-RNN, a novel architecture based on repeating each sample a fixed number of times. We found surprising results, where Repeat-RNN performs as good as ACT in the selected tasks. Source code in TensorFlow and PyTorch is publicly available at https://imatge-upc.github.io/danifojo-2018-repeatrnn/

* Accepted as workshop paper at ICLR 2018

Via

Access Paper or Ask Questions

Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

Feb 05, 2018
Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, Shih-Fu Chang

Figure 1 for Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

Figure 2 for Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

Figure 3 for Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

Figure 4 for Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

Recurrent Neural Networks (RNNs) continue to show outstanding performance in sequence modeling tasks. However, training RNNs on long sequences often face challenges like slow inference, vanishing gradients and difficulty in capturing long term dependencies. In backpropagation through time settings, these issues are tightly coupled with the large, sequential computational graph resulting from unfolding the RNN in time. We introduce the Skip RNN model which extends existing RNN models by learning to skip state updates and shortens the effective size of the computational graph. This model can also be encouraged to perform fewer state updates through a budget constraint. We evaluate the proposed model on various tasks and show how it can reduce the number of required RNN updates while preserving, and sometimes even improving, the performance of the baseline RNN models. Source code is publicly available at https://imatge-upc.github.io/skiprnn-2017-telecombcn/ .

* Accepted as conference paper at ICLR 2018

Via

Access Paper or Ask Questions

Detection-aided liver lesion segmentation using deep learning

Nov 29, 2017
Miriam Bellver, Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Xavier Giro-i-Nieto, Jordi Torres, Luc Van Gool

Figure 1 for Detection-aided liver lesion segmentation using deep learning

Figure 2 for Detection-aided liver lesion segmentation using deep learning

Figure 3 for Detection-aided liver lesion segmentation using deep learning

A fully automatic technique for segmenting the liver and localizing its unhealthy tissues is a convenient tool in order to diagnose hepatic diseases and assess the response to the according treatments. In this work we propose a method to segment the liver and its lesions from Computed Tomography (CT) scans using Convolutional Neural Networks (CNNs), that have proven good results in a variety of computer vision tasks, including medical imaging. The network that segments the lesions consists of a cascaded architecture, which first focuses on the region of the liver in order to segment the lesions on it. Moreover, we train a detector to localize the lesions, and mask the results of the segmentation network with the positive detections. The segmentation architecture is based on DRIU, a Fully Convolutional Network (FCN) with side outputs that work on feature maps of different resolutions, to finally benefit from the multi-scale information learned by different stages of the network. The main contribution of this work is the use of a detector to localize the lesions, which we show to be beneficial to remove false positives triggered by the segmentation network. Source code and models are available at https://imatge-upc.github.io/liverseg-2017-nipsws/ .

* NIPS 2017 Workshop on Machine Learning for Health (ML4H)

Via

Access Paper or Ask Questions

Saliency Weighted Convolutional Features for Instance Search

Nov 29, 2017
Eva Mohedano, Kevin McGuinness, Xavier Giro-i-Nieto, Noel E. O'Connor

Figure 1 for Saliency Weighted Convolutional Features for Instance Search

Figure 2 for Saliency Weighted Convolutional Features for Instance Search

Figure 3 for Saliency Weighted Convolutional Features for Instance Search

Figure 4 for Saliency Weighted Convolutional Features for Instance Search

This work explores attention models to weight the contribution of local convolutional representations for the instance search task. We present a retrieval framework based on bags of local convolutional features (BLCF) that benefits from saliency weighting to build an efficient image representation. The use of human visual attention models (saliency) allows significant improvements in retrieval performance without the need to conduct region analysis or spatial verification, and without requiring any feature fine tuning. We investigate the impact of different saliency models, finding that higher performance on saliency benchmarks does not necessarily equate to improved performance when used in instance search tasks. The proposed approach outperforms the state-of-the-art on the challenging INSTRE benchmark by a large margin, and provides similar performance on the Oxford and Paris benchmarks compared to more complex methods that use off-the-shelf representations. The source code used in this project is available at https://imatge-upc.github.io/salbow/

Via

Access Paper or Ask Questions

Cost-Effective Active Learning for Melanoma Segmentation

Nov 28, 2017
Marc Gorriz, Axel Carlier, Emmanuel Faure, Xavier Giro-i-Nieto

Figure 1 for Cost-Effective Active Learning for Melanoma Segmentation

Figure 2 for Cost-Effective Active Learning for Melanoma Segmentation

Figure 3 for Cost-Effective Active Learning for Melanoma Segmentation

We propose a novel Active Learning framework capable to train effectively a convolutional neural network for semantic segmentation of medical imaging, with a limited amount of training labeled data. Our contribution is a practical Cost-Effective Active Learning approach using dropout at test time as Monte Carlo sampling to model the pixel-wise uncertainty and to analyze the image information to improve the training performance. The source code of this project is available at https://marc-gorriz.github.io/CEAL-Medical-Image-Segmentation/ .

* NIPS ML4H 2017 workshop

Via

Access Paper or Ask Questions

More cat than cute? Interpretable Prediction of Adjective-Noun Pairs

Aug 21, 2017
Delia Fernandez, Alejandro Woodward, Victor Campos, Xavier Giro-i-Nieto, Brendan Jou, Shih-Fu Chang

Figure 1 for More cat than cute? Interpretable Prediction of Adjective-Noun Pairs

Figure 2 for More cat than cute? Interpretable Prediction of Adjective-Noun Pairs

Figure 3 for More cat than cute? Interpretable Prediction of Adjective-Noun Pairs

Figure 4 for More cat than cute? Interpretable Prediction of Adjective-Noun Pairs

The increasing availability of affect-rich multimedia resources has bolstered interest in understanding sentiment and emotions in and from visual content. Adjective-noun pairs (ANP) are a popular mid-level semantic construct for capturing affect via visually detectable concepts such as "cute dog" or "beautiful landscape". Current state-of-the-art methods approach ANP prediction by considering each of these compound concepts as individual tokens, ignoring the underlying relationships in ANPs. This work aims at disentangling the contributions of the `adjectives' and `nouns' in the visual prediction of ANPs. Two specialised classifiers, one trained for detecting adjectives and another for nouns, are fused to predict 553 different ANPs. The resulting ANP prediction model is more interpretable as it allows us to study contributions of the adjective and noun components. Source code and models are available at https://imatge-upc.github.io/affective-2017-musa2/ .

* Oral paper at ACM Multimedia 2017 Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes (MUSA2)

Via

Access Paper or Ask Questions