Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tao Xiang

Boundary-sensitive Pre-training for Temporal Localization in Videos

Nov 24, 2020

Mengmeng Xu, Juan-Manuel Perez-Rua, Victor Escorcia, Brais Martinez, Xiatian Zhu, Li Zhang, Bernard Ghanem, Tao Xiang

Figure 1 for Boundary-sensitive Pre-training for Temporal Localization in Videos

Figure 2 for Boundary-sensitive Pre-training for Temporal Localization in Videos

Figure 3 for Boundary-sensitive Pre-training for Temporal Localization in Videos

Figure 4 for Boundary-sensitive Pre-training for Temporal Localization in Videos

Abstract:Many video analysis tasks require temporal localization thus detection of content changes. However, most existing models developed for these tasks are pre-trained on general video action classification tasks. This is because large scale annotation of temporal boundaries in untrimmed videos is expensive. Therefore no suitable datasets exist for temporal boundary-sensitive pre-training. In this paper for the first time, we investigate model pre-training for temporal localization by introducing a novel boundary-sensitive pretext (BSP) task. Instead of relying on costly manual annotations of temporal boundaries, we propose to synthesize temporal boundaries in existing video action classification datasets. With the synthesized boundaries, BSP can be simply conducted via classifying the boundary types. This enables the learning of video representations that are much more transferable to downstream temporal localization tasks. Extensive experiments show that the proposed BSP is superior and complementary to the existing action classification based pre-training counterpart, and achieves new state-of-the-art performance on several temporal localization tasks.

* 11 pages, 4 figures

Via

Access Paper or Ask Questions

The Hidden Vulnerability of Watermarking for Deep Neural Networks

Sep 18, 2020

Shangwei Guo, Tianwei Zhang, Han Qiu, Yi Zeng, Tao Xiang, Yang Liu

Figure 1 for The Hidden Vulnerability of Watermarking for Deep Neural Networks

Figure 2 for The Hidden Vulnerability of Watermarking for Deep Neural Networks

Figure 3 for The Hidden Vulnerability of Watermarking for Deep Neural Networks

Figure 4 for The Hidden Vulnerability of Watermarking for Deep Neural Networks

Abstract:Watermarking has shown its effectiveness in protecting the intellectual property of Deep Neural Networks (DNNs). Existing techniques usually embed a set of carefully-crafted sample-label pairs into the target model during the training process. Then ownership verification is performed by querying a suspicious model with those watermark samples and checking the prediction results. These watermarking solutions claim to be robustness against model transformations, which is challenged by this paper. We design a novel watermark removal attack, which can defeat state-of-the-art solutions without any prior knowledge of the adopted watermarking technique and training samples. We make two contributions in the design of this attack. First, we propose a novel preprocessing function, which embeds imperceptible patterns and performs spatial-level transformations over the input. This function can make the watermark sample unrecognizable by the watermarked model, while still maintaining the correct prediction results of normal samples. Second, we introduce a fine-tuning strategy using unlabelled and out-of-distribution samples, which can improve the model usability in an efficient manner. Extensive experimental results indicate that our proposed attack can effectively bypass existing watermarking solutions with very high success rates.

Via

Access Paper or Ask Questions

Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval

Aug 11, 2020

Aneeshan Sain, Ayan Kumar Bhunia, Yongxin Yang, Tao Xiang, Yi-Zhe Song

Figure 1 for Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval

Figure 2 for Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval

Figure 3 for Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval

Figure 4 for Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval

Abstract:Sketch as an image search query is an ideal alternative to text in capturing the fine-grained visual details. Prior successes on fine-grained sketch-based image retrieval (FG-SBIR) have demonstrated the importance of tackling the unique traits of sketches as opposed to photos, e.g., temporal vs. static, strokes vs. pixels, and abstract vs. pixel-perfect. In this paper, we study a further trait of sketches that has been overlooked to date, that is, they are hierarchical in terms of the levels of detail -- a person typically sketches up to various extents of detail to depict an object. This hierarchical structure is often visually distinct. In this paper, we design a novel network that is capable of cultivating sketch-specific hierarchies and exploiting them to match sketch with photo at corresponding hierarchical levels. In particular, features from a sketch and a photo are enriched using cross-modal co-attention, coupled with hierarchical node fusion at every level to form a better embedding space to conduct retrieval. Experiments on common benchmarks show our method to outperform state-of-the-arts by a significant margin.

* Accepted for ORAL presentation in BMVC 2020

Via

Access Paper or Ask Questions

BézierSketch: A generative model for scalable vector sketches

Jul 14, 2020

Ayan Das, Yongxin Yang, Timothy Hospedales, Tao Xiang, Yi-Zhe Song

Figure 1 for BézierSketch: A generative model for scalable vector sketches

Figure 2 for BézierSketch: A generative model for scalable vector sketches

Figure 3 for BézierSketch: A generative model for scalable vector sketches

Figure 4 for BézierSketch: A generative model for scalable vector sketches

Abstract:The study of neural generative models of human sketches is a fascinating contemporary modeling problem due to the links between sketch image generation and the human drawing process. The landmark SketchRNN provided breakthrough by sequentially generating sketches as a sequence of waypoints. However this leads to low-resolution image generation, and failure to model long sketches. In this paper we present B\'ezierSketch, a novel generative model for fully vector sketches that are automatically scalable and high-resolution. To this end, we first introduce a novel inverse graphics approach to stroke embedding that trains an encoder to embed each stroke to its best fit B\'ezier curve. This enables us to treat sketches as short sequences of paramaterized strokes and thus train a recurrent sketch generator with greater capacity for longer sketches, while producing scalable high-resolution results. We report qualitative and quantitative results on the Quick, Draw! benchmark.

* Accepted as poster at ECCV 2020

Via

Access Paper or Ask Questions

On Learning Semantic Representations for Million-Scale Free-Hand Sketches

Jul 07, 2020

Peng Xu, Yongye Huang, Tongtong Yuan, Tao Xiang, Timothy M. Hospedales, Yi-Zhe Song, Liang Wang

Figure 1 for On Learning Semantic Representations for Million-Scale Free-Hand Sketches

Figure 2 for On Learning Semantic Representations for Million-Scale Free-Hand Sketches

Figure 3 for On Learning Semantic Representations for Million-Scale Free-Hand Sketches

Figure 4 for On Learning Semantic Representations for Million-Scale Free-Hand Sketches

Abstract:In this paper, we study learning semantic representations for million-scale free-hand sketches. This is highly challenging due to the domain-unique traits of sketches, e.g., diverse, sparse, abstract, noisy. We propose a dual-branch CNNRNN network architecture to represent sketches, which simultaneously encodes both the static and temporal patterns of sketch strokes. Based on this architecture, we further explore learning the sketch-oriented semantic representations in two challenging yet practical settings, i.e., hashing retrieval and zero-shot recognition on million-scale sketches. Specifically, we use our dual-branch architecture as a universal representation framework to design two sketch-specific deep models: (i) We propose a deep hashing model for sketch retrieval, where a novel hashing loss is specifically designed to accommodate both the abstract and messy traits of sketches. (ii) We propose a deep embedding model for sketch zero-shot recognition, via collecting a large-scale edge-map dataset and proposing to extract a set of semantic vectors from edge-maps as the semantic knowledge for sketch zero-shot domain alignment. Both deep models are evaluated by comprehensive experiments on million-scale sketches and outperform the state-of-the-art competitors.

* arXiv admin note: substantial text overlap with arXiv:1804.01401

Via

Access Paper or Ask Questions

Learning to Generate Novel Domains for Domain Generalization

Jul 07, 2020

Kaiyang Zhou, Yongxin Yang, Timothy Hospedales, Tao Xiang

Figure 1 for Learning to Generate Novel Domains for Domain Generalization

Figure 2 for Learning to Generate Novel Domains for Domain Generalization

Figure 3 for Learning to Generate Novel Domains for Domain Generalization

Figure 4 for Learning to Generate Novel Domains for Domain Generalization

Abstract:This paper focuses on domain generalization (DG), the task of learning from multiple source domains a model that generalizes well to unseen domains. A main challenge for DG is that the available source domains often exhibit limited diversity, hampering the model's ability to learn to generalize. We therefore employ a data generator to synthesize data from pseudo-novel domains to augment the source domains. This explicitly increases the diversity of available training domains and leads to a more generalizable model. To train the generator, we model the distribution divergence between source and synthesized pseudo-novel domains using optimal transport, and maximize the divergence. To ensure that semantics are preserved in the synthesized data, we further impose cycle-consistency and classification losses on the generator. Our method, L2A-OT (Learning to Augment by Optimal Transport) outperforms current state-of-the-art DG methods on four benchmark datasets.

* To appear in ECCV'20

Via

Access Paper or Ask Questions

Egocentric Action Recognition by Video Attention and Temporal Context

Jul 03, 2020

Juan-Manuel Perez-Rua, Antoine Toisoul, Brais Martinez, Victor Escorcia, Li Zhang, Xiatian Zhu, Tao Xiang

Figure 1 for Egocentric Action Recognition by Video Attention and Temporal Context

Figure 2 for Egocentric Action Recognition by Video Attention and Temporal Context

Figure 3 for Egocentric Action Recognition by Video Attention and Temporal Context

Figure 4 for Egocentric Action Recognition by Video Attention and Temporal Context

Abstract:We present the submission of Samsung AI Centre Cambridge to the CVPR2020 EPIC-Kitchens Action Recognition Challenge. In this challenge, action recognition is posed as the problem of simultaneously predicting a single `verb' and `noun' class label given an input trimmed video clip. That is, a `verb' and a `noun' together define a compositional `action' class. The challenging aspects of this real-life action recognition task include small fast moving objects, complex hand-object interactions, and occlusions. At the core of our submission is a recently-proposed spatial-temporal video attention model, called `W3' (`What-Where-When') attention~\cite{perez2020knowing}. We further introduce a simple yet effective contextual learning mechanism to model `action' class scores directly from long-term temporal behaviour based on the `verb' and `noun' prediction scores. Our solution achieves strong performance on the challenge metrics without using object-specific reasoning nor extra training data. In particular, our best solution with multimodal ensemble achieves the 2$^{nd}$ best position for `verb', and 3$^{rd}$ best for `noun' and `action' on the Seen Kitchens test set.

* EPIC-Kitchens challenges@CVPR 2020

Via

Access Paper or Ask Questions

Differentially Private Decentralized Learning

Jun 14, 2020

Shangwei Guo, Tianwei Zhang, Tao Xiang, Yang Liu

Figure 1 for Differentially Private Decentralized Learning

Figure 2 for Differentially Private Decentralized Learning

Figure 3 for Differentially Private Decentralized Learning

Figure 4 for Differentially Private Decentralized Learning

Abstract:Decentralized learning has received great attention for its high efficiency and performance. In such systems, every participant constantly exchanges parameters with each other to train a shared model, which can put him at the risk of data privacy leakage. Differential Privacy (DP) has been adopted to enhance the Stochastic Gradient Descent (SGD) algorithm. However, these approaches mainly focus on single-party learning, or centralized learning in the synchronous mode. In this paper, we design a novel DP-SGD algorithm for decentralized learning systems. The key contribution of our solution is a \emph{topology-aware} optimization strategy, which leverages the unique network characteristics of decentralized systems to effectively reduce the noise scale and improve the model usability. Besides, we design a novel learning protocol for both synchronous and asynchronous decentralized systems by restricting the sensitivity of the SGD algorithm and maximizing the noise reduction. We formally analyze and prove the DP requirement of our proposed algorithms. Experimental evaluations demonstrate that our algorithm achieves a better trade-off between usability and privacy than prior works.

Via

Access Paper or Ask Questions

Long-Term Cloth-Changing Person Re-identification

May 27, 2020

Xuelin Qian, Wenxuan Wang, Li Zhang, Fangrui Zhu, Yanwei Fu, Tao Xiang, Yu-Gang Jiang, Xiangyang Xue

Figure 1 for Long-Term Cloth-Changing Person Re-identification

Figure 2 for Long-Term Cloth-Changing Person Re-identification

Figure 3 for Long-Term Cloth-Changing Person Re-identification

Figure 4 for Long-Term Cloth-Changing Person Re-identification

Abstract:Person re-identification (Re-ID) aims to match a target person across camera views at different locations and times. Existing Re-ID studies focus on the short-term cloth-consistent setting, under which a person re-appears in different camera views with the same outfit. A discriminative feature representation learned by existing deep Re-ID models is thus dominated by the visual appearance of clothing. In this work, we focus on a much more difficult yet practical setting where person matching is conducted over long-duration, e.g., over days and months and therefore inevitably under the new challenge of changing clothes. This problem, termed Long-Term Cloth-Changing (LTCC) Re-ID is much understudied due to the lack of large scale datasets. The first contribution of this work is a new LTCC dataset containing people captured over a long period of time with frequent clothing changes. As a second contribution, we propose a novel Re-ID method specifically designed to address the cloth-changing challenge. Specifically, we consider that under cloth-changes, soft-biometrics such as body shape would be more reliable. We, therefore, introduce a shape embedding module as well as a cloth-elimination shape-distillation module aiming to eliminate the now unreliable clothing appearance features and focus on the body shape information. Extensive experiments show that superior performance is achieved by the proposed model on the new LTCC dataset. The code and dataset will be available at https://naiq.github.io/LTCC_Perosn_ReID.html.

* 24 pages, 10 figures, 5 tables

Via

Access Paper or Ask Questions

Knowing What, Where and When to Look: Efficient Video Action Modeling with Attention

Apr 02, 2020

Juan-Manuel Perez-Rua, Brais Martinez, Xiatian Zhu, Antoine Toisoul, Victor Escorcia, Tao Xiang

Figure 1 for Knowing What, Where and When to Look: Efficient Video Action Modeling with Attention

Figure 2 for Knowing What, Where and When to Look: Efficient Video Action Modeling with Attention

Figure 3 for Knowing What, Where and When to Look: Efficient Video Action Modeling with Attention

Figure 4 for Knowing What, Where and When to Look: Efficient Video Action Modeling with Attention

Abstract:Attentive video modeling is essential for action recognition in unconstrained videos due to their rich yet redundant information over space and time. However, introducing attention in a deep neural network for action recognition is challenging for two reasons. First, an effective attention module needs to learn what (objects and their local motion patterns), where (spatially), and when (temporally) to focus on. Second, a video attention module must be efficient because existing action recognition models already suffer from high computational cost. To address both challenges, a novel What-Where-When (W3) video attention module is proposed. Departing from existing alternatives, our W3 module models all three facets of video attention jointly. Crucially, it is extremely efficient by factorizing the high-dimensional video feature data into low-dimensional meaningful spaces (1D channel vector for `what' and 2D spatial tensors for `where'), followed by lightweight temporal attention reasoning. Extensive experiments show that our attention model brings significant improvements to existing action recognition models, achieving new state-of-the-art performance on a number of benchmarks.

Via

Access Paper or Ask Questions