Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Improving Neural Network Quantization using Outlier Channel Splitting

Jan 28, 2019
Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang

Quantization can improve the execution latency and energy efficiency of neural networks on both commodity GPUs and specialized accelerators. The majority of existing literature focuses on training quantized DNNs, while this work examines the less-studied topic of quantizing a floating-point model without (re)training. DNN weights and activations follow a bell-shaped distribution post-training, while practical hardware uses a linear quantization grid. This leads to challenges in dealing with outliers in the distribution. Prior work has addressed this by clipping the outliers or using specialized hardware. In this work, we propose outlier channel splitting (OCS), which duplicates channels containing outliers, then halves the channel values. The network remains functionally identical, but affected outliers are moved toward the center of the distribution. OCS requires no additional training and works on commodity hardware. Experimental evaluation on ImageNet classification and language modeling shows that OCS can outperform state-of-the-art clipping techniques with only minor overhead.

* 10 pages 

  Access Paper or Ask Questions

Spatio-Temporal Road Scene Reconstruction using Superpixel MRF

Nov 27, 2018
Yaochen Li, Yuehu Liu, Jihua Zhu, Shiqi Ma, Zhenning Niu, Rui Guo

Scene models construction based on image rendering is a hot topic in the computer vision community. In this paper, we propose a framework to construct road scene models based on 3D corridor structures. The construction of scene models consists of two successive stages: road detection and scene construction. The road detection is implemented via a new superpixel Markov random field (MRF) algorithm. The data fidelity term of the energy function is jointly computed using the superpixel features of color, texture and location. The smoothness term is defined by the interaction of spatio-temporally adjacent superpixels. The control points of road boundaries are generated with the constraint of vanishing point. Subsequently, the road scene models are constructed, where the foreground and background regions are modeled independently. Numerous applications are developed based on the proposed framework, e.g., traffic scenes simulation. The experiments and comparisons are conducted for both the road detection and scene construction stages, which prove the effectiveness of the proposed method.

  Access Paper or Ask Questions

The 30-Year Cycle In The AI Debate

Oct 08, 2018
Jean-Marie Chauvet

In the last couple of years, the rise of Artificial Intelligence and the successes of academic breakthroughs in the field have been inescapable. Vast sums of money have been thrown at AI start-ups. Many existing tech companies -- including the giants like Google, Amazon, Facebook, and Microsoft -- have opened new research labs. The rapid changes in these everyday work and entertainment tools have fueled a rising interest in the underlying technology itself; journalists write about AI tirelessly, and companies -- of tech nature or not -- brand themselves with AI, Machine Learning or Deep Learning whenever they get a chance. Confronting squarely this media coverage, several analysts are starting to voice concerns about over-interpretation of AI's blazing successes and the sometimes poor public reporting on the topic. This paper reviews briefly the track-record in AI and Machine Learning and finds this pattern of early dramatic successes, followed by philosophical critique and unexpected difficulties, if not downright stagnation, returning almost to the clock in 30-year cycles since 1958.

* 31 pages, 5 tables 

  Access Paper or Ask Questions

Progressive Structure from Motion

Jul 10, 2018
Alex Locher, Michal Havlena, Luc Van Gool

Structure from Motion or the sparse 3D reconstruction out of individual photos is a long studied topic in computer vision. Yet none of the existing reconstruction pipelines fully addresses a progressive scenario where images are only getting available during the reconstruction process and intermediate results are delivered to the user. Incremental pipelines are capable of growing a 3D model but often get stuck in local minima due to wrong (binding) decisions taken based on incomplete information. Global pipelines on the other hand need the access to the complete viewgraph and are not capable of delivering intermediate results. In this paper we propose a new reconstruction pipeline working in a progressive manner rather than in a batch processing scheme. The pipeline is able to recover from failed reconstructions in early stages, avoids to take binding decisions, delivers a progressive output and yet maintains the capabilities of existing pipelines. We demonstrate and evaluate our method on diverse challenging public and dedicated datasets including those with highly symmetric structures and compare to the state of the art.

* Accepted to ECCV 2018 

  Access Paper or Ask Questions

Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation

Jun 24, 2018
Xinghao Chen, Guijin Wang, Hengkai Guo, Cairong Zhang

Hand pose estimation from a single depth image is an essential topic in computer vision and human computer interaction. Despite recent advancements in this area promoted by convolutional neural network, accurate hand pose estimation is still a challenging problem. In this paper we propose a Pose guided structured Region Ensemble Network (Pose-REN) to boost the performance of hand pose estimation. The proposed method extracts regions from the feature maps of convolutional neural network under the guide of an initially estimated pose, generating more optimal and representative features for hand pose estimation. The extracted feature regions are then integrated hierarchically according to the topology of hand joints by employing tree-structured fully connections. A refined estimation of hand pose is directly regressed by the proposed network and the final hand pose is obtained by utilizing an iterative cascaded method. Comprehensive experiments on public hand pose datasets demonstrate that our proposed method outperforms state-of-the-art algorithms.

* Accepted by Neurocomputing 

  Access Paper or Ask Questions

Human and Smart Machine Co-Learning with Brain Computer Interface

Feb 19, 2018
Chang-Shing Lee, Mei-Hui Wang, Li-Wei Ko, Naoyuki Kubota, Lu-An Lin, Shinya Kitaoka, Yu-Te Wang, Shun-Feng Su

Machine learning has become a very popular approach for cybernetics systems, and it has always been considered important research in the Computational Intelligence area. Nevertheless, when it comes to smart machines, it is not just about the methodologies. We need to consider systems and cybernetics as well as include human in the loop. The purpose of this article is as follows: (1) To integrate the open source Facebook AI Research (FAIR) DarkForest program of Facebook with Item Response Theory (IRT), to the new open learning system, namely, DDF learning system; (2) To integrate DDF Go with Robot namely Robotic DDF Go system; (3) To invite the professional Go players to attend the activity to play Go games on site with a smart machine. The research team will apply this technology to education, such as, playing games to enhance the children concentration on learning mathematics, languages, and other topics. With the detected brainwaves, the robot will be able to speak some words that are very much to the point for the students and to assist the teachers in classroom in the future.

* This article will be published in IEEE SMC Magazine, vol. 4, no. 2, 2018 

  Access Paper or Ask Questions

Complex-valued image denosing based on group-wise complex-domain sparsity

Nov 01, 2017
Vladimir Katkovnik, Mykola Ponomarenko, Karen Egiazarian

Phase imaging and wavefront reconstruction from noisy observations of complex exponent is a topic of this paper. It is a highly non-linear problem because the exponent is a 2{\pi}-periodic function of phase. The reconstruction of phase and amplitude is difficult. Even with an additive Gaussian noise in observations distributions of noisy components in phase and amplitude are signal dependent and non-Gaussian. Additional difficulties follow from a prior unknown correlation of phase and amplitude in real life scenarios. In this paper, we propose a new class of non-iterative and iterative complex domain filters based on group-wise sparsity in complex domain. This sparsity is based on the techniques implemented in Block-Matching 3D filtering (BM3D) and 3D/4D High-Order Singular Decomposition (HOSVD) exploited for spectrum design, analysis and filtering. The introduced algorithms are a generalization of the ideas used in the CD-BM3D algorithms presented in our previous publications. The algorithms are implemented as a MATLAB Toolbox. The efficiency of the algorithms is demonstrated by simulation tests.

* Submitted to Signal Processing 

  Access Paper or Ask Questions

Learning Action Concept Trees and Semantic Alignment Networks from Image-Description Data

Sep 08, 2016
Jiyang Gao, Ram Nevatia

Action classification in still images has been a popular research topic in computer vision. Labelling large scale datasets for action classification requires tremendous manual work, which is hard to scale up. Besides, the action categories in such datasets are pre-defined and vocabularies are fixed. However humans may describe the same action with different phrases, which leads to the difficulty of vocabulary expansion for traditional fully-supervised methods. We observe that large amounts of images with sentence descriptions are readily available on the Internet. The sentence descriptions can be regarded as weak labels for the images, which contain rich information and could be used to learn flexible expressions of action categories. We propose a method to learn an Action Concept Tree (ACT) and an Action Semantic Alignment (ASA) model for classification from image-description data via a two-stage learning process. A new dataset for the task of learning actions from descriptions is built. Experimental results show that our method outperforms several baseline methods significantly.

* 16 pages, 5 figures 

  Access Paper or Ask Questions

Unifying Decision Trees Split Criteria Using Tsallis Entropy

Aug 23, 2016
Yisen Wang, Chaobing Song, Shu-Tao Xia

The construction of efficient and effective decision trees remains a key topic in machine learning because of their simplicity and flexibility. A lot of heuristic algorithms have been proposed to construct near-optimal decision trees. ID3, C4.5 and CART are classical decision tree algorithms and the split criteria they used are Shannon entropy, Gain Ratio and Gini index respectively. All the split criteria seem to be independent, actually, they can be unified in a Tsallis entropy framework. Tsallis entropy is a generalization of Shannon entropy and provides a new approach to enhance decision trees' performance with an adjustable parameter $q$. In this paper, a Tsallis Entropy Criterion (TEC) algorithm is proposed to unify Shannon entropy, Gain Ratio and Gini index, which generalizes the split criteria of decision trees. More importantly, we reveal the relations between Tsallis entropy with different $q$ and other split criteria. Experimental results on UCI data sets indicate that the TEC algorithm achieves statistically significant improvement over the classical algorithms.

* 6 pages, 2 figures 

  Access Paper or Ask Questions

Convolutional Feature Masking for Joint Object and Stuff Segmentation

Apr 02, 2015
Jifeng Dai, Kaiming He, Jian Sun

The topic of semantic segmentation has witnessed considerable progress due to the powerful features learned by convolutional neural networks (CNNs). The current leading approaches for semantic segmentation exploit shape information by extracting CNN features from masked image regions. This strategy introduces artificial boundaries on the images and may impact the quality of the extracted features. Besides, the operations on the raw image domain require to compute thousands of networks on a single image, which is time-consuming. In this paper, we propose to exploit shape information via masking convolutional features. The proposal segments (e.g., super-pixels) are treated as masks on the convolutional feature maps. The CNN features of segments are directly masked out from these maps and used to train classifiers for recognition. We further propose a joint method to handle objects and "stuff" (e.g., grass, sky, water) in the same framework. State-of-the-art results are demonstrated on benchmarks of PASCAL VOC and new PASCAL-CONTEXT, with a compelling computational speed.

* IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015 

  Access Paper or Ask Questions