Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mingkui Tan

Intelligent Home 3D: Automatic 3D-House Design from Linguistic Descriptions Only

Mar 01, 2020
Qi Chen, Qi Wu, Rui Tang, Yuhan Wang, Shuai Wang, Mingkui Tan

Figure 1 for Intelligent Home 3D: Automatic 3D-House Design from Linguistic Descriptions Only

Figure 2 for Intelligent Home 3D: Automatic 3D-House Design from Linguistic Descriptions Only

Figure 3 for Intelligent Home 3D: Automatic 3D-House Design from Linguistic Descriptions Only

Figure 4 for Intelligent Home 3D: Automatic 3D-House Design from Linguistic Descriptions Only

Home design is a complex task that normally requires architects to finish with their professional skills and tools. It will be fascinating that if one can produce a house plan intuitively without knowing much knowledge about home design and experience of using complex designing tools, for example, via natural language. In this paper, we formulate it as a language conditioned visual content generation problem that is further divided into a floor plan generation and an interior texture (such as floor and wall) synthesis task. The only control signal of the generation process is the linguistic expression given by users that describe the house details. To this end, we propose a House Plan Generative Model (HPGM) that first translates the language input to a structural graph representation and then predicts the layout of rooms with a Graph Conditioned Layout Prediction Network (GC LPN) and generates the interior texture with a Language Conditioned Texture GAN (LCT-GAN). With some post-processing, the final product of this task is a 3D house model. To train and evaluate our model, we build the first Text-to-3D House Model dataset.

* To appear in CVPR2020

Via

Access Paper or Ask Questions

Joint Wasserstein Distribution Matching

Mar 01, 2020
JieZhang Cao, Langyuan Mo, Qing Du, Yong Guo, Peilin Zhao, Junzhou Huang, Mingkui Tan

Figure 1 for Joint Wasserstein Distribution Matching

Figure 2 for Joint Wasserstein Distribution Matching

Figure 3 for Joint Wasserstein Distribution Matching

Figure 4 for Joint Wasserstein Distribution Matching

Joint distribution matching (JDM) problem, which aims to learn bidirectional mappings to match joint distributions of two domains, occurs in many machine learning and computer vision applications. This problem, however, is very difficult due to two critical challenges: (i) it is often difficult to exploit sufficient information from the joint distribution to conduct the matching; (ii) this problem is hard to formulate and optimize. In this paper, relying on optimal transport theory, we propose to address JDM problem by minimizing the Wasserstein distance of the joint distributions in two domains. However, the resultant optimization problem is still intractable. We then propose an important theorem to reduce the intractable problem into a simple optimization problem, and develop a novel method (called Joint Wasserstein Distribution Matching (JWDM)) to solve it. In the experiments, we apply our method to unsupervised image translation and cross-domain video synthesis. Both qualitative and quantitative comparisons demonstrate the superior performance of our method over several state-of-the-arts.

* This paper is accepted by Chinese Journal of Computers in 2020

Via

Access Paper or Ask Questions

Discrimination-aware Network Pruning for Deep Model Compression

Jan 04, 2020
Jing Liu, Bohan Zhuang, Zhuangwei Zhuang, Yong Guo, Junzhou Huang, Jinhui Zhu, Mingkui Tan

Figure 1 for Discrimination-aware Network Pruning for Deep Model Compression

Figure 2 for Discrimination-aware Network Pruning for Deep Model Compression

Figure 3 for Discrimination-aware Network Pruning for Deep Model Compression

Figure 4 for Discrimination-aware Network Pruning for Deep Model Compression

We study network pruning which aims to remove redundant channels/kernels and hence speed up the inference of deep networks. Existing pruning methods either train from scratch with sparsity constraints or minimize the reconstruction error between the feature maps of the pre-trained models and the compressed ones. Both strategies suffer from some limitations: the former kind is computationally expensive and difficult to converge, while the latter kind optimizes the reconstruction error but ignores the discriminative power of channels. In this paper, we propose a simple-yet-effective method called discrimination-aware channel pruning (DCP) to choose the channels that actually contribute to the discriminative power. Note that a channel often consists of a set of kernels. Besides the redundancy in channels, some kernels in a channel may also be redundant and fail to contribute to the discriminative power of the network, resulting in kernel level redundancy. To solve this, we propose a discrimination-aware kernel pruning (DKP) method to further compress deep networks by removing redundant kernels. To prevent DCP/DKP from selecting redundant channels/kernels, we propose a new adaptive stopping condition, which helps to automatically determine the number of selected channels/kernels and often results in more compact models with better performance. Extensive experiments on both image classification and face recognition demonstrate the effectiveness of our methods. For example, on ILSVRC-12, the resultant ResNet-50 model with 30% reduction of channels even outperforms the baseline model by 0.36% in terms of Top-1 accuracy. The pruned MobileNetV1 and MobileNetV2 achieve 1.93x and 1.42x inference acceleration on a mobile device, respectively, with negligible performance degradation. The source code and the pre-trained models are available at https://github.com/SCUT-AILab/DCP.

* 14 pages. Extended version of the NeurIPS paper arXiv:1810.11809

Via

Access Paper or Ask Questions

Online Adaptive Asymmetric Active Learning with Limited Budgets

Nov 18, 2019
Yifan Zhang, Peilin Zhao, Shuaicheng Niu, Qingyao Wu, Jiezhang Cao, Junzhou Huang, Mingkui Tan

Figure 1 for Online Adaptive Asymmetric Active Learning with Limited Budgets

Figure 2 for Online Adaptive Asymmetric Active Learning with Limited Budgets

Figure 3 for Online Adaptive Asymmetric Active Learning with Limited Budgets

Figure 4 for Online Adaptive Asymmetric Active Learning with Limited Budgets

Online Active Learning (OAL) aims to manage unlabeled datastream by selectively querying the label of data. OAL is applicable to many real-world problems, such as anomaly detection in health-care and finance. In these problems, there are two key challenges: the query budget is often limited; the ratio between classes is highly imbalanced. In practice, it is quite difficult to handle imbalanced unlabeled datastream when only a limited budget of labels can be queried for training. To solve this, previous OAL studies adopt either asymmetric losses or queries (an isolated asymmetric strategy) to tackle the imbalance, and use first-order methods to optimize the cost-sensitive measure. However, the isolated strategy limits their performance in class imbalance, while first-order methods restrict their optimization performance. In this paper, we propose a novel Online Adaptive Asymmetric Active learning algorithm, based on a new asymmetric strategy (merging both asymmetric losses and queries strategies), and second-order optimization. We theoretically analyze its mistake bound and cost-sensitive metric bounds. Moreover, to better balance performance and efficiency, we enhance our algorithm via a sketching technique, which significantly accelerates the computational speed with quite slight performance degradation. Promising results demonstrate the effectiveness and efficiency of the proposed methods.

* IEEE Transactions on Knowledge and Data Engineering (TKDE), 2019

Via

Access Paper or Ask Questions

NAT: Neural Architecture Transformer for Accurate and Compact Architectures

Nov 18, 2019
Yong Guo, Yin Zheng, Mingkui Tan, Qi Chen, Jian Chen, Peilin Zhao, Junzhou Huang

Figure 1 for NAT: Neural Architecture Transformer for Accurate and Compact Architectures

Figure 2 for NAT: Neural Architecture Transformer for Accurate and Compact Architectures

Figure 3 for NAT: Neural Architecture Transformer for Accurate and Compact Architectures

Figure 4 for NAT: Neural Architecture Transformer for Accurate and Compact Architectures

Designing effective architectures is one of the key factors behind the success of deep neural networks. Existing deep architectures are either manually designed or automatically searched by some Neural Architecture Search (NAS) methods. However, even a well-searched architecture may still contain many non-significant or redundant modules or operations (e.g., convolution or pooling), which may not only incur substantial memory consumption and computation cost but also deteriorate the performance. Thus, it is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computation cost. Unfortunately, such a constrained optimization problem is NP-hard. To make the problem feasible, we cast the optimization problem into a Markov decision process (MDP) and seek to learn a Neural Architecture Transformer (NAT) to replace the redundant operations with the more computationally efficient ones (e.g., skip connection or directly removing the connection). Based on MDP, we learn NAT by exploiting reinforcement learning to obtain the optimization policies w.r.t. different architectures. To verify the effectiveness of the proposed strategies, we apply NAT on both hand-crafted architectures and NAS based architectures. Extensive experiments on two benchmark datasets, i.e., CIFAR-10 and ImageNet, demonstrate that the transformed architecture by NAT significantly outperforms both its original form and those architectures optimized by existing methods.

* This paper is accepted by NeurIPS 2019

Via

Access Paper or Ask Questions

Collaborative Unsupervised Domain Adaptation for Medical Image Diagnosis

Nov 17, 2019
Yifan Zhang, Ying Wei, Peilin Zhao, Shuaicheng Niu, Qingyao Wu, Mingkui Tan, Junzhou Huang

Figure 1 for Collaborative Unsupervised Domain Adaptation for Medical Image Diagnosis

Figure 2 for Collaborative Unsupervised Domain Adaptation for Medical Image Diagnosis

Figure 3 for Collaborative Unsupervised Domain Adaptation for Medical Image Diagnosis

Figure 4 for Collaborative Unsupervised Domain Adaptation for Medical Image Diagnosis

Deep learning based medical image diagnosis has shown great potential in clinical medicine. However, it often suffers two major difficulties in practice: 1) only limited labeled samples are available due to expensive annotation costs over medical images; 2) labeled images may contain considerable label noises (e.g., mislabeling labels) due to diagnostic difficulties. In this paper, we seek to exploit rich labeled data from relevant domains to help the learning in the target task with unsupervised domain adaptation (UDA). Unlike most existing UDA methods which rely on clean labeled data or assume samples are equally transferable, we propose a novel Collaborative Unsupervised Domain Adaptation algorithm to conduct transferability-aware domain adaptation and conquer label noise in a cooperative way. Promising empirical results verify the superiority of the proposed method.

* Medical Imaging meets NeurIPS, 2019

Via

Access Paper or Ask Questions

Multi-marginal Wasserstein GAN

Nov 03, 2019
Jiezhang Cao, Langyuan Mo, Yifan Zhang, Kui Jia, Chunhua Shen, Mingkui Tan

Figure 1 for Multi-marginal Wasserstein GAN

Figure 2 for Multi-marginal Wasserstein GAN

Figure 3 for Multi-marginal Wasserstein GAN

Figure 4 for Multi-marginal Wasserstein GAN

Multiple marginal matching problem aims at learning mappings to match a source domain to multiple target domains and it has attracted great attention in many applications, such as multi-domain image translation. However, addressing this problem has two critical challenges: (i) Measuring the multi-marginal distance among different domains is very intractable; (ii) It is very difficult to exploit cross-domain correlations to match the target domain distributions. In this paper, we propose a novel Multi-marginal Wasserstein GAN (MWGAN) to minimize Wasserstein distance among domains. Specifically, with the help of multi-marginal optimal transport theory, we develop a new adversarial objective function with inner- and inter-domain constraints to exploit cross-domain correlations. Moreover, we theoretically analyze the generalization performance of MWGAN, and empirically evaluate it on the balanced and imbalanced translation tasks. Extensive experiments on toy and real-world datasets demonstrate the effectiveness of MWGAN.

* This paper is accepted by NeurIPS 2019

Via

Access Paper or Ask Questions

Structured Binary Neural Networks for Image Recognition

Sep 22, 2019
Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, Ian Reid

Figure 1 for Structured Binary Neural Networks for Image Recognition

Figure 2 for Structured Binary Neural Networks for Image Recognition

Figure 3 for Structured Binary Neural Networks for Image Recognition

Figure 4 for Structured Binary Neural Networks for Image Recognition

We propose methods to train convolutional neural networks (CNNs) with both binarized weights and activations, leading to quantized models that are specifically friendly to mobile devices with limited power capacity and computation resources. Previous works on quantizing CNNs often seek to approximate the floating-point information using a set of discrete values, which we call value approximation, typically assuming the same architecture as the full-precision networks. Here we take a novel "structure approximation" view of quantization---it is very likely that different architectures designed for low-bit networks may be better for achieving good performance. In particular, we propose a "network decomposition" strategy, termed Group-Net, in which we divide the network into groups. Thus, each full-precision group can be effectively reconstructed by aggregating a set of homogeneous binary branches. In addition, we learn effective connections among groups to improve the representation capability. Moreover, the proposed Group-Net shows strong generalization to other tasks. For instance, we extend Group-Net for accurate semantic segmentation by embedding rich context into the binary structure. Furthermore, for the first time, we apply binary neural networks to object detection. Experiments on both classification, semantic segmentation and object detection tasks demonstrate the superior performance of the proposed methods over various quantized networks in the literature. Our methods outperform the previous best binary neural networks in terms of accuracy and computation efficiency.

* 15 pages. Extended version of the conference version arXiv:1811.10413

Via

Access Paper or Ask Questions

Graph Convolutional Networks for Temporal Action Localization

Sep 07, 2019
Runhao Zeng, Wenbing Huang, Mingkui Tan, Yu Rong, Peilin Zhao, Junzhou Huang, Chuang Gan

Figure 1 for Graph Convolutional Networks for Temporal Action Localization

Figure 2 for Graph Convolutional Networks for Temporal Action Localization

Figure 3 for Graph Convolutional Networks for Temporal Action Localization

Figure 4 for Graph Convolutional Networks for Temporal Action Localization

Most state-of-the-art action localization systems process each action proposal individually, without explicitly exploiting their relations during learning. However, the relations between proposals actually play an important role in action localization, since a meaningful action always consists of multiple proposals in a video. In this paper, we propose to exploit the proposal-proposal relations using Graph Convolutional Networks (GCNs). First, we construct an action proposal graph, where each proposal is represented as a node and their relations between two proposals as an edge. Here, we use two types of relations, one for capturing the context information for each proposal and the other one for characterizing the correlations between distinct actions. Then we apply the GCNs over the graph to model the relations among different proposals and learn powerful representations for the action classification and localization. Experimental results show that our approach significantly outperforms the state-of-the-art on THUMOS14 (49.1% versus 42.8%). Moreover, augmentation experiments on ActivityNet also verify the efficacy of modeling action proposal relationships. Codes are available at https://github.com/Alvin-Zeng/PGCN.

* ICCV 2019

Via

Access Paper or Ask Questions