Alert button
Picture for Yi-Zhe Song

Yi-Zhe Song

Alert button

Deep Sketch-Based Modeling: Tips and Tricks

Nov 12, 2020
Yue Zhong, Yulia Gryaditskaya, Honggang Zhang, Yi-Zhe Song

Figure 1 for Deep Sketch-Based Modeling: Tips and Tricks
Figure 2 for Deep Sketch-Based Modeling: Tips and Tricks
Figure 3 for Deep Sketch-Based Modeling: Tips and Tricks
Figure 4 for Deep Sketch-Based Modeling: Tips and Tricks

Deep image-based modeling received lots of attention in recent years, yet the parallel problem of sketch-based modeling has only been briefly studied, often as a potential application. In this work, for the first time, we identify the main differences between sketch and image inputs: (i) style variance, (ii) imprecise perspective, and (iii) sparsity. We discuss why each of these differences can pose a challenge, and even make a certain class of image-based methods inapplicable. We study alternative solutions to address each of the difference. By doing so, we drive out a few important insights: (i) sparsity commonly results in an incorrect prediction of foreground versus background, (ii) diversity of human styles, if not taken into account, can lead to very poor generalization properties, and finally (iii) unless a dedicated sketching interface is used, one can not expect sketches to match a perspective of a fixed viewpoint. Finally, we compare a set of representative deep single-image modeling solutions and show how their performance can be improved to tackle sketch input by taking into consideration the identified critical differences.

Viaarxiv icon

Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval

Aug 11, 2020
Aneeshan Sain, Ayan Kumar Bhunia, Yongxin Yang, Tao Xiang, Yi-Zhe Song

Figure 1 for Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval
Figure 2 for Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval
Figure 3 for Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval
Figure 4 for Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval

Sketch as an image search query is an ideal alternative to text in capturing the fine-grained visual details. Prior successes on fine-grained sketch-based image retrieval (FG-SBIR) have demonstrated the importance of tackling the unique traits of sketches as opposed to photos, e.g., temporal vs. static, strokes vs. pixels, and abstract vs. pixel-perfect. In this paper, we study a further trait of sketches that has been overlooked to date, that is, they are hierarchical in terms of the levels of detail -- a person typically sketches up to various extents of detail to depict an object. This hierarchical structure is often visually distinct. In this paper, we design a novel network that is capable of cultivating sketch-specific hierarchies and exploiting them to match sketch with photo at corresponding hierarchical levels. In particular, features from a sketch and a photo are enriched using cross-modal co-attention, coupled with hierarchical node fusion at every level to form a better embedding space to conduct retrieval. Experiments on common benchmarks show our method to outperform state-of-the-arts by a significant margin.

* Accepted for ORAL presentation in BMVC 2020 
Viaarxiv icon

BézierSketch: A generative model for scalable vector sketches

Jul 14, 2020
Ayan Das, Yongxin Yang, Timothy Hospedales, Tao Xiang, Yi-Zhe Song

Figure 1 for BézierSketch: A generative model for scalable vector sketches
Figure 2 for BézierSketch: A generative model for scalable vector sketches
Figure 3 for BézierSketch: A generative model for scalable vector sketches
Figure 4 for BézierSketch: A generative model for scalable vector sketches

The study of neural generative models of human sketches is a fascinating contemporary modeling problem due to the links between sketch image generation and the human drawing process. The landmark SketchRNN provided breakthrough by sequentially generating sketches as a sequence of waypoints. However this leads to low-resolution image generation, and failure to model long sketches. In this paper we present B\'ezierSketch, a novel generative model for fully vector sketches that are automatically scalable and high-resolution. To this end, we first introduce a novel inverse graphics approach to stroke embedding that trains an encoder to embed each stroke to its best fit B\'ezier curve. This enables us to treat sketches as short sequences of paramaterized strokes and thus train a recurrent sketch generator with greater capacity for longer sketches, while producing scalable high-resolution results. We report qualitative and quantitative results on the Quick, Draw! benchmark.

* Accepted as poster at ECCV 2020 
Viaarxiv icon

On Learning Semantic Representations for Million-Scale Free-Hand Sketches

Jul 07, 2020
Peng Xu, Yongye Huang, Tongtong Yuan, Tao Xiang, Timothy M. Hospedales, Yi-Zhe Song, Liang Wang

Figure 1 for On Learning Semantic Representations for Million-Scale Free-Hand Sketches
Figure 2 for On Learning Semantic Representations for Million-Scale Free-Hand Sketches
Figure 3 for On Learning Semantic Representations for Million-Scale Free-Hand Sketches
Figure 4 for On Learning Semantic Representations for Million-Scale Free-Hand Sketches

In this paper, we study learning semantic representations for million-scale free-hand sketches. This is highly challenging due to the domain-unique traits of sketches, e.g., diverse, sparse, abstract, noisy. We propose a dual-branch CNNRNN network architecture to represent sketches, which simultaneously encodes both the static and temporal patterns of sketch strokes. Based on this architecture, we further explore learning the sketch-oriented semantic representations in two challenging yet practical settings, i.e., hashing retrieval and zero-shot recognition on million-scale sketches. Specifically, we use our dual-branch architecture as a universal representation framework to design two sketch-specific deep models: (i) We propose a deep hashing model for sketch retrieval, where a novel hashing loss is specifically designed to accommodate both the abstract and messy traits of sketches. (ii) We propose a deep embedding model for sketch zero-shot recognition, via collecting a large-scale edge-map dataset and proposing to extract a set of semantic vectors from edge-maps as the semantic knowledge for sketch zero-shot domain alignment. Both deep models are evaluated by comprehensive experiments on million-scale sketches and outperform the state-of-the-art competitors.

* arXiv admin note: substantial text overlap with arXiv:1804.01401 
Viaarxiv icon

Sequential Learning for Domain Generalization

Apr 03, 2020
Da Li, Yongxin Yang, Yi-Zhe Song, Timothy Hospedales

Figure 1 for Sequential Learning for Domain Generalization
Figure 2 for Sequential Learning for Domain Generalization
Figure 3 for Sequential Learning for Domain Generalization
Figure 4 for Sequential Learning for Domain Generalization

In this paper we propose a sequential learning framework for Domain Generalization (DG), the problem of training a model that is robust to domain shift by design. Various DG approaches have been proposed with different motivating intuitions, but they typically optimize for a single step of domain generalization -- training on one set of domains and generalizing to one other. Our sequential learning is inspired by the idea lifelong learning, where accumulated experience means that learning the $n^{th}$ thing becomes easier than the $1^{st}$ thing. In DG this means encountering a sequence of domains and at each step training to maximise performance on the next domain. The performance at domain $n$ then depends on the previous $n-1$ learning problems. Thus backpropagating through the sequence means optimizing performance not just for the next domain, but all following domains. Training on all such sequences of domains provides dramatically more `practice' for a base DG learner compared to existing approaches, thus improving performance on a true testing domain. This strategy can be instantiated for different base DG algorithms, but we focus on its application to the recently proposed Meta-Learning Domain generalization (MLDG). We show that for MLDG it leads to a simple to implement and fast algorithm that provides consistent performance improvement on a variety of DG benchmarks.

* tech report 
Viaarxiv icon

Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches

Mar 10, 2020
Ruoyi Du, Dongliang Chang, Ayan Kumar Bhunia, Jiyang Xie, Yi-Zhe Song, Zhanyu Ma, Jun Guo

Figure 1 for Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches
Figure 2 for Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches
Figure 3 for Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches
Figure 4 for Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches

Fine-grained visual classification (FGVC) is much more challenging than traditional classification tasks due to the inherently subtle intra-class object variations. Recent works mainly tackle this problem by focusing on how to locate the most discriminative parts, more complementary parts, and parts of various granularities. However, less effort has been placed to which granularities are the most discriminative and how to fuse information cross multi-granularity. In this work, we propose a novel framework for fine-grained visual classification to tackle these problems. In particular, we propose: (i) a novel progressive training strategy that adds new layers in each training step to exploit information based on the smaller granularity information found at the last step and the previous stage. (ii) a simple jigsaw puzzle generator to form images contain information of different granularity levels. We obtain state-of-the-art performances on several standard FGVC benchmark datasets, where the proposed method consistently outperforms existing methods or delivers competitive results. The code will be available at https://github.com/RuoyiDu/PMG-Progressive-Multi-Granularity-Training.

Viaarxiv icon

Mind the Gap: Enlarging the Domain Gap in Open Set Domain Adaptation

Mar 10, 2020
Dongliang Chang, Aneeshan Sain, Zhanyu Ma, Yi-Zhe Song, Jun Guo

Figure 1 for Mind the Gap: Enlarging the Domain Gap in Open Set Domain Adaptation
Figure 2 for Mind the Gap: Enlarging the Domain Gap in Open Set Domain Adaptation
Figure 3 for Mind the Gap: Enlarging the Domain Gap in Open Set Domain Adaptation
Figure 4 for Mind the Gap: Enlarging the Domain Gap in Open Set Domain Adaptation

Unsupervised domain adaptation aims to leverage labeled data from a source domain to learn a classifier for an unlabeled target domain. Among its many variants, open set domain adaptation (OSDA) is perhaps the most challenging, as it further assumes the presence of unknown classes in the target domain. In this paper, we study OSDA with a particular focus on enriching its ability to traverse across larger domain gaps. Firstly, we show that existing state-of-the-art methods suffer a considerable performance drop in the presence of larger domain gaps, especially on a new dataset (PACS) that we re-purposed for OSDA. We then propose a novel framework to specifically address the larger domain gaps. The key insight lies with how we exploit the mutually beneficial information between two networks; (a) to separate samples of known and unknown classes, (b) to maximize the domain confusion between source and target domain without the influence of unknown samples. It follows that (a) and (b) will mutually supervise each other and alternate until convergence. Extensive experiments are conducted on Office-31, Office-Home, and PACS datasets, demonstrating the superiority of our method in comparison to other state-of-the-arts. Code available at https://github.com/dongliangchang/Mutual-to-Separate/

Viaarxiv icon

Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval

Mar 05, 2020
Ayan Kumar Bhunia, Yongxin Yang, Timothy M. Hospedales, Tao Xiang, Yi-Zhe Song

Figure 1 for Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval
Figure 2 for Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval
Figure 3 for Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval
Figure 4 for Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval

Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of retrieving a particular photo instance given a user's query sketch. Its widespread applicability is however hindered by the fact that drawing a sketch takes time, and most people struggle to draw a complete and faithful sketch. In this paper, we reformulate the conventional FG-SBIR framework to tackle these challenges, with the ultimate goal of retrieving the target photo with the least number of strokes possible. We further propose an on-the-fly design that starts retrieving as soon as the user starts drawing. To accomplish this, we devise a reinforcement learning-based cross-modal retrieval framework that directly optimizes rank of the ground-truth photo over a complete sketch drawing episode. Additionally, we introduce a novel reward scheme that circumvents the problems related to irrelevant sketch strokes, and thus provides us with a more consistent rank list during the retrieval. We achieve superior early-retrieval efficiency over state-of-the-art methods and alternative baselines on two publicly available fine-grained sketch retrieval datasets.

* IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020 
Viaarxiv icon