Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Philip H. S. Torr

University of Oxford

Calibrating Deep Neural Networks using Focal Loss

Feb 21, 2020

Jishnu Mukhoti, Viveka Kulharia, Amartya Sanyal, Stuart Golodetz, Philip H. S. Torr, Puneet K. Dokania

Figure 1 for Calibrating Deep Neural Networks using Focal Loss

Figure 2 for Calibrating Deep Neural Networks using Focal Loss

Figure 3 for Calibrating Deep Neural Networks using Focal Loss

Figure 4 for Calibrating Deep Neural Networks using Focal Loss

Abstract:Miscalibration -- a mismatch between a model's confidence and its correctness -- of Deep Neural Networks (DNNs) makes their predictions hard to rely on. Ideally, we want networks to be accurate, calibrated and confident. We show that, as opposed to the standard cross-entropy loss, focal loss (Lin et al., 2017) allows us to learn models that are already very well calibrated. When combined with temperature scaling, whilst preserving accuracy, it yields state-of-the-art calibrated models. We provide a thorough analysis of the factors causing miscalibration, and use the insights we glean from this to justify the empirically excellent performance of focal loss. To facilitate the use of focal loss in practice, we also provide a principled approach to automatically select the hyperparameter involved in the loss function. We perform extensive experiments on a variety of computer vision and NLP datasets, and with a wide variety of network architectures, and show that our approach achieves state-of-the-art accuracy and calibration in almost all cases.

Via

Access Paper or Ask Questions

Image-to-Image Translation with Text Guidance

Feb 12, 2020

Bowen Li, Xiaojuan Qi, Philip H. S. Torr, Thomas Lukasiewicz

Figure 1 for Image-to-Image Translation with Text Guidance

Figure 2 for Image-to-Image Translation with Text Guidance

Figure 3 for Image-to-Image Translation with Text Guidance

Figure 4 for Image-to-Image Translation with Text Guidance

Abstract:The goal of this paper is to embed controllable factors, i.e., natural language descriptions, into image-to-image translation with generative adversarial networks, which allows text descriptions to determine the visual attributes of synthetic images. We propose four key components: (1) the implementation of part-of-speech tagging to filter out non-semantic words in the given description, (2) the adoption of an affine combination module to effectively fuse different modality text and image features, (3) a novel refined multi-stage architecture to strengthen the differential ability of discriminators and the rectification ability of generators, and (4) a new structure loss to further improve discriminators to better distinguish real and synthetic images. Extensive experiments on the COCO dataset demonstrate that our method has a superior performance on both visual realism and semantic consistency with given descriptions.

Via

Access Paper or Ask Questions

Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation

Feb 03, 2020

Hao Tang, Dan Xu, Yan Yan, Jason J. Corso, Philip H. S. Torr, Nicu Sebe

Figure 1 for Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation

Figure 2 for Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation

Figure 3 for Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation

Figure 4 for Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation

Abstract:We propose a novel model named Multi-Channel Attention Selection Generative Adversarial Network (SelectionGAN) for guided image-to-image translation, where we translate an input image into another while respecting an external semantic guidance. The proposed SelectionGAN explicitly utilizes the semantic guidance information and consists of two stages. In the first stage, the input image and the conditional semantic guidance are fed into a cycled semantic-guided generation network to produce initial coarse results. In the second stage, we refine the initial results by using the proposed multi-scale spatial pooling \& channel selection module and the multi-channel attention selection module. Moreover, uncertainty maps automatically learned from attention maps are used to guide the pixel loss for better network optimization. Exhaustive experiments on four challenging guided image-to-image translation tasks (face, hand, body and street view) demonstrate that our SelectionGAN is able to generate significantly better results than the state-of-the-art methods. Meanwhile, the proposed framework and modules are unified solutions and can be applied to solve other generation tasks, such as semantic image synthesis. The code is available at https://github.com/Ha0Tang/SelectionGAN.

* An extended version of a paper published in CVPR2019. arXiv admin note: substantial text overlap with arXiv:1904.06807

Via

Access Paper or Ask Questions

Unifying Training and Inference for Panoptic Segmentation

Jan 14, 2020

Qizhu Li, Xiaojuan Qi, Philip H. S. Torr

Figure 1 for Unifying Training and Inference for Panoptic Segmentation

Figure 2 for Unifying Training and Inference for Panoptic Segmentation

Figure 3 for Unifying Training and Inference for Panoptic Segmentation

Figure 4 for Unifying Training and Inference for Panoptic Segmentation

Abstract:We present an end-to-end network to bridge the gap between training and inference pipeline for panoptic segmentation, a task that seeks to partition an image into semantic regions for "stuff" and object instances for "things". In contrast to recent works, our network exploits a parametrised, yet lightweight panoptic segmentation submodule, powered by an end-to-end learnt dense instance affinity, to capture the probability that any pair of pixels belong to the same instance. This panoptic submodule gives rise to a novel propagation mechanism for panoptic logits and enables the network to output a coherent panoptic segmentation map for both "stuff" and "thing" classes, without any post-processing. Reaping the benefits of end-to-end training, our full system sets new records on the popular street scene dataset, Cityscapes, achieving 61.4 PQ with a ResNet-50 backbone using only the fine annotations. On the challenging COCO dataset, our ResNet-50-based network also delivers state-of-the-art accuracy of 43.4 PQ. Moreover, our network flexibly works with and without object mask cues, performing competitively under both settings, which is of interest for applications with computation budgets.

Via

Access Paper or Ask Questions

Rethinking Class Relations: Absolute-relative Few-shot Learning

Jan 12, 2020

Hongguang Zhang, Philip H. S. Torr, Hongdong Li, Songlei Jian, Piotr Koniusz

Figure 1 for Rethinking Class Relations: Absolute-relative Few-shot Learning

Figure 2 for Rethinking Class Relations: Absolute-relative Few-shot Learning

Figure 3 for Rethinking Class Relations: Absolute-relative Few-shot Learning

Figure 4 for Rethinking Class Relations: Absolute-relative Few-shot Learning

Abstract:The majority of existing few-shot learning describe image relations with {0,1} binary labels. However, such binary relations are insufficient to teach the network complicated real-world relations, due to the lack of decision smoothness. Furthermore, current few-shot learning models capture only the similarity via relation labels, but they are not exposed to class concepts associated with objects, which is likely detrimental to the classification performance due to underutilization of the available class labels. To paraphrase, while children learn the concept of tiger from a few of examples with ease, and while they learn from comparisons of tiger to other animals, they are also taught the actual concept names. Thus, we hypothesize that in fact both similarity and class concept learning must be occurring simultaneously. With these observations at hand, we study the fundamental problem of simplistic class modeling in current few-shot learning, we rethink the relations between class concepts, and propose a novel absolute-relative learning paradigm to fully take advantage of label information to refine the image representations and correct the relation understanding. Our proposed absolute-relative learning paradigm improves the performance of several the state-of-the-art models on publicly available datasets.

Via

Access Paper or Ask Questions

Few-shot Action Recognition via Improved Attention with Self-supervision

Jan 12, 2020

Hongguang Zhang, Li Zhang, Xiaojuan Qi, Hongdong Li, Philip H. S. Torr, Piotr Koniusz

Figure 1 for Few-shot Action Recognition via Improved Attention with Self-supervision

Figure 2 for Few-shot Action Recognition via Improved Attention with Self-supervision

Figure 3 for Few-shot Action Recognition via Improved Attention with Self-supervision

Figure 4 for Few-shot Action Recognition via Improved Attention with Self-supervision

Abstract:Most existing few-shot learning methods in computer vision focus on class recognition given a few of still images as the input. In contrast, this paper tackles a more challenging task of few-shot action-recognition from video clips. We propose a simple framework which is both flexible and easy to implement. Our approach exploits joint spatial and temporal attention mechanisms in conjunction with self-supervised representation learning on videos. This design encourages the model to discover and encode spatial and temporal attention hotspots important during the similarity learning between dynamic video sequences for which locations of discriminative patterns vary in the spatio-temporal sense. Our method compares favorably with several state-of-the-art baselines on HMDB51, miniMIT and UCF101 datasets, demonstrating its superior performance.

Via

Access Paper or Ask Questions

Few-shot Learning with Multi-scale Self-supervision

Jan 06, 2020

Hongguang Zhang, Philip H. S. Torr, Piotr Koniusz

Figure 1 for Few-shot Learning with Multi-scale Self-supervision

Figure 2 for Few-shot Learning with Multi-scale Self-supervision

Figure 3 for Few-shot Learning with Multi-scale Self-supervision

Figure 4 for Few-shot Learning with Multi-scale Self-supervision

Abstract:Learning concepts from the limited number of datapoints is a challenging task usually addressed by the so-called one- or few-shot learning. Recently, an application of second-order pooling in few-shot learning demonstrated its superior performance due to the aggregation step handling varying image resolutions without the need of modifying CNNs to fit to specific image sizes, yet capturing highly descriptive co-occurrences. However, using a single resolution per image (even if the resolution varies across a dataset) is suboptimal as the importance of image contents varies across the coarse-to-fine levels depending on the object and its class label e. g., generic objects and scenes rely on their global appearance while fine-grained objects rely more on their localized texture patterns. Multi-scale representations are popular in image deblurring, super-resolution and image recognition but they have not been investigated in few-shot learning due to its relational nature complicating the use of standard techniques. In this paper, we propose a novel multi-scale relation network based on the properties of second-order pooling to estimate image relations in few-shot setting. To optimize the model, we leverage a scale selector to re-weight scale-wise representations based on their second-order features. Furthermore, we propose to a apply self-supervised scale prediction. Specifically, we leverage an extra discriminator to predict the scale labels and the scale discrepancy between pairs of images. Our model achieves state-of-the-art results on standard few-shot learning datasets.

Via

Access Paper or Ask Questions

AttentionGAN: Unpaired Image-to-Image Translation using Attention-Guided Generative Adversarial Networks

Dec 28, 2019

Hao Tang, Hong Liu, Dan Xu, Philip H. S. Torr, Nicu Sebe

Figure 1 for AttentionGAN: Unpaired Image-to-Image Translation using Attention-Guided Generative Adversarial Networks

Figure 2 for AttentionGAN: Unpaired Image-to-Image Translation using Attention-Guided Generative Adversarial Networks

Figure 3 for AttentionGAN: Unpaired Image-to-Image Translation using Attention-Guided Generative Adversarial Networks

Figure 4 for AttentionGAN: Unpaired Image-to-Image Translation using Attention-Guided Generative Adversarial Networks

Abstract:State-of-the-art methods in the unpaired image-to-image translation are capable of learning a mapping from a source domain to a target domain with unpaired image data. Though the existing methods have achieved promising results, they still produce unsatisfied artifacts, being able to convert low-level information while limited in transforming high-level semantics of input images. One possible reason is that generators do not have the ability to perceive the most discriminative semantic parts between the source and target domains, thus making the generated images low quality. In this paper, we propose a new Attention-Guided Generative Adversarial Networks (AttentionGAN) for the unpaired image-to-image translation task. AttentionGAN can identify the most discriminative semantic objects and minimize changes of unwanted parts for semantic manipulation problems without using extra data and models. The attention-guided generators in AttentionGAN are able to produce attention masks via a built-in attention mechanism, and then fuse the generation output with the attention masks to obtain high-quality target images. Accordingly, we also design a novel attention-guided discriminator which only considers attended regions. Extensive experiments are conducted on several generative tasks, demonstrating that the proposed model is effective to generate sharper and more realistic images compared with existing competitive models. The source code for the proposed AttentionGAN is available at https://github.com/Ha0Tang/AttentionGAN.

* An extended version of a paper published in IJCNN2019. arXiv admin note: substantial text overlap with arXiv:1903.12296. Add more results

Via

Access Paper or Ask Questions

Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

Dec 27, 2019

Hao Tang, Dan Xu, Yan Yan, Philip H. S. Torr, Nicu Sebe

Figure 1 for Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

Figure 2 for Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

Figure 3 for Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

Figure 4 for Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

Abstract:In this paper, we address the task of semantic-guided scene generation. One open challenge in scene generation is the difficulty of the generation of small objects and detailed local texture, which has been widely observed in global image-level generation methods. To tackle this issue, in this work we consider learning the scene generation in a local context, and correspondingly design a local class-specific generative network with semantic maps as a guidance, which separately constructs and learns sub-generators concentrating on the generation of different classes, and is able to provide more scene details. To learn more discriminative class-specific feature representations for the local generation, a novel classification module is also proposed. To combine the advantage of both the global image-level and the local class-specific generation, a joint generation network is designed with an attention fusion module and a dual-discriminator structure embedded. Extensive experiments on two scene image generation tasks show superior generation performance of the proposed model. The state-of-the-art results are established by large margins on both tasks and on challenging public benchmarks. The source code and trained models are available at https://github.com/Ha0Tang/LGGAN.

* 11 pages, 11 figures

Via

Access Paper or Ask Questions

Learning Regional Attraction for Line Segment Detection

Dec 18, 2019

Nan Xue, Song Bai, Fu-Dong Wang, Gui-Song Xia, Tianfu Wu, Liangpei Zhang, Philip H. S. Torr

Figure 1 for Learning Regional Attraction for Line Segment Detection

Figure 2 for Learning Regional Attraction for Line Segment Detection

Figure 3 for Learning Regional Attraction for Line Segment Detection

Figure 4 for Learning Regional Attraction for Line Segment Detection

Abstract:This paper presents regional attraction of line segment maps, and hereby poses the problem of line segment detection (LSD) as a problem of region coloring. Given a line segment map, the proposed regional attraction first establishes the relationship between line segments and regions in the image lattice. Based on this, the line segment map is equivalently transformed to an attraction field map (AFM), which can be remapped to a set of line segments without loss of information. Accordingly, we develop an end-to-end framework to learn attraction field maps for raw input images, followed by a squeeze module to detect line segments. Apart from existing works, the proposed detector properly handles the local ambiguity and does not rely on the accurate identification of edge pixels. Comprehensive experiments on the Wireframe dataset and the YorkUrban dataset demonstrate the superiority of our method. In particular, we achieve an F-measure of 0.831 on the Wireframe dataset, advancing the state-of-the-art performance by 10.3 percent.

* Accepted to IEEE TPAMI. arXiv admin note: text overlap with arXiv:1812.02122

Via

Access Paper or Ask Questions