Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mao Ye

Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection

Mar 03, 2020
Mao Ye, Chengyue Gong, Lizhen Nie, Denny Zhou, Adam Klivans, Qiang Liu

Figure 1 for Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection

Figure 2 for Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection

Figure 3 for Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection

Figure 4 for Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection

Recent empirical works show that large deep neural networks are often highly redundant and one can find much smaller subnetworks without a significant drop of accuracy. However, most existing methods of network pruning are empirical and heuristic, leaving it open whether good subnetworks provably exist, how to find them efficiently, and if network pruning can be provably better than direct training using gradient descent. We answer these problems positively by proposing a simple greedy selection approach for finding good subnetworks, which starts from an empty network and greedily adds important neurons from the large network. This differs from the existing methods based on backward elimination, which remove redundant neurons from the large network. Theoretically, applying our greedy selection strategy on sufficiently large pre-trained networks guarantees to find small subnetworks with lower loss than networks directly trained with gradient descent. Practically, we improve prior arts of network pruning on learning compact neural architectures on ImageNet, including ResNet, MobilenetV2/V3, and ProxylessNet. Our theory and empirical results on MobileNet suggest that we should fine-tune the pruned subnetworks to leverage the information from the large model, instead of re-training from new random initialization as suggested in \citet{liu2018rethinking}.

Via

Access Paper or Ask Questions

Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision

Feb 24, 2020
Xingchao Liu, Mao Ye, Dengyong Zhou, Qiang Liu

Figure 1 for Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision

Figure 2 for Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision

Figure 3 for Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision

Figure 4 for Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision

We consider the post-training quantization problem, which discretizes the weights of pre-trained deep neural networks without re-training the model. We propose multipoint quantization, a quantization method that approximates a full-precision weight vector using a linear combination of multiple vectors of low-bit numbers; this is in contrast to typical quantization methods that approximate each weight using a single low precision number. Computationally, we construct the multipoint quantization with an efficient greedy selection procedure, and adaptively decides the number of low precision points on each quantized weight vector based on the error of its output. This allows us to achieve higher precision levels for important weights that greatly influence the outputs, yielding an 'effect of mixed precision' but without physical mixed precision implementations (which requires specialized hardware accelerators). Empirically, our method can be implemented by common operands, bringing almost no memory and computation overhead. We show that our method outperforms a range of state-of-the-art methods on ImageNet classification and it can be generalized to more challenging tasks like PASCAL VOC object detection.

Via

Access Paper or Ask Questions

Black-Box Certification with Randomized Smoothing: A Functional Optimization Based Framework

Feb 21, 2020
Dinghuai Zhang, Mao Ye, Chengyue Gong, Zhanxing Zhu, Qiang Liu

Figure 1 for Black-Box Certification with Randomized Smoothing: A Functional Optimization Based Framework

Figure 2 for Black-Box Certification with Randomized Smoothing: A Functional Optimization Based Framework

Figure 3 for Black-Box Certification with Randomized Smoothing: A Functional Optimization Based Framework

Figure 4 for Black-Box Certification with Randomized Smoothing: A Functional Optimization Based Framework

Randomized classifiers have been shown to provide a promising approach for achieving certified robustness against adversarial attacks in deep learning. However, most existing methods only leverage Gaussian smoothing noise and only work for $\ell_2$ perturbation. We propose a general framework of adversarial certification with non-Gaussian noise and for more general types of attacks, from a unified functional optimization perspective. Our new framework allows us to identify a key trade-off between accuracy and robustness via designing smoothing distributions, helping to design new families of non-Gaussian smoothing distributions that work more efficiently for different $\ell_p$ settings, including $\ell_1$, $\ell_2$ and $\ell_\infty$ attacks. Our proposed methods achieve better certification results than previous works and provide a new perspective on randomized smoothing certification.

Via

Access Paper or Ask Questions

Stein Self-Repulsive Dynamics: Benefits From Past Samples

Feb 21, 2020
Mao Ye, Tongzheng Ren, Qiang Liu

Figure 1 for Stein Self-Repulsive Dynamics: Benefits From Past Samples

Figure 2 for Stein Self-Repulsive Dynamics: Benefits From Past Samples

Figure 3 for Stein Self-Repulsive Dynamics: Benefits From Past Samples

Figure 4 for Stein Self-Repulsive Dynamics: Benefits From Past Samples

We propose a new Stein self-repulsive dynamics for obtaining diversified samples from intractable un-normalized distributions. Our idea is to introduce Stein variational gradient as a repulsive force to push the samples of Langevin dynamics away from the past trajectories. This simple idea allows us to significantly decrease the auto-correlation in Langevin dynamics and hence increase the effective sample size. Importantly, as we establish in our theoretical analysis, the asymptotic stationary distribution remains correct even with the addition of the repulsive force, thanks to the special properties of the Stein variational gradient. We perform extensive empirical studies of our new algorithm, showing that our method yields much higher sample efficiency and better uncertainty estimation than vanilla Langevin dynamics.

Via

Access Paper or Ask Questions

MaxUp: A Simple Way to Improve Generalization of Neural Network Training

Feb 20, 2020
Chengyue Gong, Tongzheng Ren, Mao Ye, Qiang Liu

Figure 1 for MaxUp: A Simple Way to Improve Generalization of Neural Network Training

Figure 2 for MaxUp: A Simple Way to Improve Generalization of Neural Network Training

Figure 3 for MaxUp: A Simple Way to Improve Generalization of Neural Network Training

Figure 4 for MaxUp: A Simple Way to Improve Generalization of Neural Network Training

We propose \emph{MaxUp}, an embarrassingly simple, highly effective technique for improving the generalization performance of machine learning models, especially deep neural networks. The idea is to generate a set of augmented data with some random perturbations or transforms and minimize the maximum, or worst case loss over the augmented data. By doing so, we implicitly introduce a smoothness or robustness regularization against the random perturbations, and hence improve the generation performance. For example, in the case of Gaussian perturbation, \emph{MaxUp} is asymptotically equivalent to using the gradient norm of the loss as a penalty to encourage smoothness. We test \emph{MaxUp} on a range of tasks, including image classification, language modeling, and adversarial certification, on which \emph{MaxUp} consistently outperforms the existing best baseline methods, without introducing substantial computational overhead. In particular, we improve ImageNet classification from the state-of-the-art top-1 accuracy $85.5\%$ without extra data to $85.8\%$. Code will be released soon.

Via

Access Paper or Ask Questions

Extended Stochastic Gradient MCMC for Large-Scale Bayesian Variable Selection

Feb 07, 2020
Qifan Song, Yan Sun, Mao Ye, Faming Liang

Figure 1 for Extended Stochastic Gradient MCMC for Large-Scale Bayesian Variable Selection

Stochastic gradient Markov chain Monte Carlo (MCMC) algorithms have received much attention in Bayesian computing for big data problems, but they are only applicable to a small class of problems for which the parameter space has a fixed dimension and the log-posterior density is differentiable with respect to the parameters. This paper proposes an extended stochastic gradient MCMC lgoriathm which, by introducing appropriate latent variables, can be applied to more general large-scale Bayesian computing problems, such as those involving dimension jumping and missing data. Numerical studies show that the proposed algorithm is highly scalable and much more efficient than traditional MCMC algorithms. The proposed algorithms have much alleviated the pain of Bayesian methods in big data computing.

Via

Access Paper or Ask Questions

Distribution-Aware Coordinate Representation for Human Pose Estimation

Oct 14, 2019
Feng Zhang, Xiatian Zhu, Hanbin Dai, Mao Ye, Ce Zhu

Figure 1 for Distribution-Aware Coordinate Representation for Human Pose Estimation

Figure 2 for Distribution-Aware Coordinate Representation for Human Pose Estimation

Figure 3 for Distribution-Aware Coordinate Representation for Human Pose Estimation

Figure 4 for Distribution-Aware Coordinate Representation for Human Pose Estimation

While being the de facto standard coordinate representation in human pose estimation, heatmap is never systematically investigated in the literature, to our best knowledge. This work fills this gap by studying the coordinate representation with a particular focus on the heatmap. Interestingly, we found that the process of decoding the predicted heatmaps into the final joint coordinates in the original image space is surprisingly significant for human pose estimation performance, which nevertheless was not recognised before. In light of the discovered importance, we further probe the design limitations of the standard coordinate decoding method widely used by existing methods, and propose a more principled distribution-aware decoding method. Meanwhile, we improve the standard coordinate encoding process (i.e. transforming ground-truth coordinates to heatmaps) by generating accurate heatmap distributions for unbiased model training. Taking the two together, we formulate a novel Distribution-Aware coordinate Representation of Keypoint (DARK) method. Serving as a model-agnostic plug-in, DARK significantly improves the performance of a variety of state-of-the-art human pose estimation models. Extensive experiments show that DARK yields the best results on two common benchmarks, MPII and COCO, consistently validating the usefulness and effectiveness of our novel coordinate representation idea.

* Results on the COCO keypoint detection challenge: 78.9% AP on the test-dev set (Top-1 in the leaderbord by 12 Oct 2019) and 76.4% AP on the test-challenge set. Project page: https://ilovepose.github.io/coco

Via

Access Paper or Ask Questions

Fast Human Pose Estimation

Nov 13, 2018
Feng Zhang, Xiatian Zhu, Mao Ye

Existing human pose estimation approaches often only consider how to improve the model generalisation performance, but putting aside the significant efficiency problem. This leads to the development of heavy models with poor scalability and cost-effectiveness in practical use. In this work, we investigate the under-studied but practically critical pose model efficiency problem. To this end, we present a new Fast Pose Distillation (FPD) model learning strategy. Specifically, the FPD trains a lightweight pose neural network architecture capable of executing rapidly with low computational cost by effectively transferring the pose structure knowledge of a strong teacher. Extensive evaluations demonstrate the advantages of our FPD method over a broad range of state-of-the-art pose estimation approaches in terms of model cost-effectiveness on the standard benchmark datasets, MPII Human Pose and Leeds Sports Pose.

Via

Access Paper or Ask Questions

Stein Neural Sampler

Oct 08, 2018
Tianyang Hu, Zixiang Chen, Hanxi Sun, Jincheng Bai, Mao Ye, Guang Cheng

We propose two novel samplers to produce high-quality samples from a given (un-normalized) probability density. The sampling is achieved by transforming a reference distribution to the target distribution with neural networks, which are trained separately by minimizing two kinds of Stein Discrepancies, and hence our method is named as Stein neural sampler. Theoretical and empirical results suggest that, compared with traditional sampling schemes, our samplers share the following three advantages: 1. Being asymptotically correct; 2. Experiencing less convergence issue in practice; 3. Generating samples instantaneously.

Via

Access Paper or Ask Questions

Do Convolutional Neural Networks Learn Class Hierarchy?

Oct 17, 2017
Bilal Alsallakh, Amin Jourabloo, Mao Ye, Xiaoming Liu, Liu Ren

Figure 1 for Do Convolutional Neural Networks Learn Class Hierarchy?

Figure 2 for Do Convolutional Neural Networks Learn Class Hierarchy?

Figure 3 for Do Convolutional Neural Networks Learn Class Hierarchy?

Figure 4 for Do Convolutional Neural Networks Learn Class Hierarchy?

Convolutional Neural Networks (CNNs) currently achieve state-of-the-art accuracy in image classification. With a growing number of classes, the accuracy usually drops as the possibilities of confusion increase. Interestingly, the class confusion patterns follow a hierarchical structure over the classes. We present visual-analytics methods to reveal and analyze this hierarchy of similar classes in relation with CNN-internal data. We found that this hierarchy not only dictates the confusion patterns between the classes, it furthermore dictates the learning behavior of CNNs. In particular, the early layers in these networks develop feature detectors that can separate high-level groups of classes quite well, even after a few training epochs. In contrast, the latter layers require substantially more epochs to develop specialized feature detectors that can separate individual classes. We demonstrate how these insights are key to significant improvement in accuracy by designing hierarchy-aware CNNs that accelerate model convergence and alleviate overfitting. We further demonstrate how our methods help in identifying various quality issues in the training data.

* IEEE Transactions on Visualization and Computer Graphics, Volume: 24, Issue: 1 (2018)
* Video demo at https://vimeo.com/228263798

Via

Access Paper or Ask Questions