Get our free extension to see links to code for papers anywhere online!Free extension: code links for papers anywhere!Free add-on: See code for papers anywhere!

Alireza Ganjdanesh, Shangqian Gao, Hirad Alipanah, Heng Huang

Generative Adversarial Networks (GANs) have shown remarkable success in modeling complex data distributions for image-to-image translation. Still, their high computational demands prohibit their deployment in practical scenarios like edge devices. Existing GAN compression methods mainly rely on knowledge distillation or convolutional classifiers' pruning techniques. Thus, they neglect the critical characteristic of GANs: their local density structure over their learned manifold. Accordingly, we approach GAN compression from a new perspective by explicitly encouraging the pruned model to preserve the density structure of the original parameter-heavy model on its learned manifold. We facilitate this objective for the pruned model by partitioning the learned manifold of the original generator into local neighborhoods around its generated samples. Then, we propose a novel pruning objective to regularize the pruned model to preserve the local density structure over each neighborhood, resembling the kernel density estimation method. Also, we develop a collaborative pruning scheme in which the discriminator and generator are pruned by two pruning agents. We design the agents to capture interactions between the generator and discriminator by exchanging their peer's feedback when determining corresponding models' architectures. Thanks to such a design, our pruning method can efficiently find performant sub-networks and can maintain the balance between the generator and discriminator more effectively compared to baselines during pruning, thereby showing more stable pruning dynamics. Our experiments on image translation GAN models, Pix2Pix and CycleGAN, with various benchmark datasets and architectures demonstrate our method's effectiveness.

Via

Minchul Kim, Shangqian Gao, Yen-Chang Hsu, Yilin Shen, Hongxia Jin

Vision Transformers (ViTs) have emerged as powerful backbones in computer vision, outperforming many traditional CNNs. However, their computational overhead, largely attributed to the self-attention mechanism, makes deployment on resource-constrained edge devices challenging. Multiple solutions rely on token pruning or token merging. In this paper, we introduce "Token Fusion" (ToFu), a method that amalgamates the benefits of both token pruning and token merging. Token pruning proves advantageous when the model exhibits sensitivity to input interpolations, while token merging is effective when the model manifests close to linear responses to inputs. We combine this to propose a new scheme called Token Fusion. Moreover, we tackle the limitations of average merging, which doesn't preserve the intrinsic feature norm, resulting in distributional shifts. To mitigate this, we introduce MLERP merging, a variant of the SLERP technique, tailored to merge multiple tokens while maintaining the norm distribution. ToFu is versatile, applicable to ViTs with or without additional training. Our empirical evaluations indicate that ToFu establishes new benchmarks in both classification and image generation tasks concerning computational efficiency and model accuracy.

Via

Alireza Ganjdanesh, Shangqian Gao, Heng Huang

Convolutional Neural Networks (CNNs) compression is crucial to deploying these models in edge devices with limited resources. Existing channel pruning algorithms for CNNs have achieved plenty of success on complex models. They approach the pruning problem from various perspectives and use different metrics to guide the pruning process. However, these metrics mainly focus on the model's `outputs' or `weights' and neglect its `interpretations' information. To fill in this gap, we propose to address the channel pruning problem from a novel perspective by leveraging the interpretations of a model to steer the pruning process, thereby utilizing information from both inputs and outputs of the model. However, existing interpretation methods cannot get deployed to achieve our goal as either they are inefficient for pruning or may predict non-coherent explanations. We tackle this challenge by introducing a selector model that predicts real-time smooth saliency masks for pruned models. We parameterize the distribution of explanatory masks by Radial Basis Function (RBF)-like functions to incorporate geometric prior of natural images in our selector model's inductive bias. Thus, we can obtain compact representations of explanations to reduce the computational costs of our pruning method. We leverage our selector model to steer the network pruning by maximizing the similarity of explanatory representations for the pruned and original models. Extensive experiments on CIFAR-10 and ImageNet benchmark datasets demonstrate the efficacy of our proposed method. Our implementations are available at \url{https://github.com/Alii-Ganjj/InterpretationsSteeredPruning}

Via

Feihu Huang, Shangqian Gao, Heng Huang

In this paper, we design a novel Bregman gradient policy optimization framework for reinforcement learning based on Bregman divergences and momentum techniques. Specifically, we propose a Bregman gradient policy optimization (BGPO) algorithm based on the basic momentum technique and mirror descent iteration. At the same time, we present an accelerated Bregman gradient policy optimization (VR-BGPO) algorithm based on a momentum variance-reduced technique. Moreover, we introduce a convergence analysis framework for our Bregman gradient policy optimization under the nonconvex setting. Specifically, we prove that BGPO achieves the sample complexity of $\tilde{O}(\epsilon^{-4})$ for finding $\epsilon$-stationary point only requiring one trajectory at each iteration, and VR-BGPO reaches the best known sample complexity of $\tilde{O}(\epsilon^{-3})$ for finding an $\epsilon$-stationary point, which also only requires one trajectory at each iteration. In particular, by using different Bregman divergences, our methods unify many existing policy optimization algorithms and their new variants such as the existing (variance-reduced) policy gradient algorithms and (variance-reduced) natural policy gradient algorithms. Extensive experimental results on multiple reinforcement learning tasks demonstrate the efficiency of our new algorithms.

Via

Feihu Huang, Shangqian Gao, Heng Huang

In the paper, we study a class of useful non-convex minimax optimization problems on the Riemanian manifold and propose a class of Riemanian gradient descent ascent algorithms to solve these minimax problems. Specifically, we propose a new Riemannian gradient descent ascent (RGDA) algorithm for the deterministic minimax optimization. Moreover, we prove that the RGDA has a sample complexity of $O(\kappa^2\epsilon^{-2})$ for finding an $\epsilon$-stationary point of the nonconvex strongly-concave minimax problems, where $\kappa$ denotes the condition number. At the same time, we introduce a Riemannian stochastic gradient descent ascent (RSGDA) algorithm for the stochastic minimax optimization. In the theoretical analysis, we prove that the RSGDA can achieve a sample complexity of $O(\kappa^4\epsilon^{-4})$. To further reduce the sample complexity, we propose a novel momentum variance-reduced Riemannian stochastic gradient descent ascent (MVR-RSGDA) algorithm based on a new momentum variance-reduced technique of STORM. We prove that the MVR-RSGDA algorithm achieves a lower sample complexity of $\tilde{O}(\kappa^{4}\epsilon^{-3})$ without large batches, which reaches near the best known sample complexity for its Euclidean counterparts. This is the first study of the minimax optimization over the Riemannian manifold. Extensive experimental results on the robust deep neural networks training over Stiefel manifold demonstrate the efficiency of our proposed algorithms.

Via

Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

In the paper, we propose a new accelerated zeroth-order momentum (Acc-ZOM) method to solve the non-convex stochastic mini-optimization problems. We prove that the Acc-ZOM method achieves a lower query complexity of $O(d^{3/4}\epsilon^{-3})$ for finding an $\epsilon$-stationary point, which improves the best known result by a factor of $O(d^{1/4})$ where $d$ denotes the parameter dimension. The Acc-ZOM does not require any batches compared to the large batches required in the existing zeroth-order stochastic algorithms. Further, we extend the Acc-ZOM method to solve the non-convex stochastic minimax-optimization problems and propose an accelerated zeroth-order momentum descent ascent (Acc-ZOMDA) method. We prove that the Acc-ZOMDA method reaches the best know query complexity of $\tilde{O}(\kappa_y^3(d_1+d_2)^{3/2}\epsilon^{-3})$ for finding an $\epsilon$-stationary point, where $d_1$ and $d_2$ denote dimensions of the mini and max optimization parameters respectively and $\kappa_y$ is condition number. In particular, our theoretical result does not rely on large batches required in the existing methods. Moreover, we propose a momentum-based accelerated framework for the minimax-optimization problems. At the same time, we present an accelerated momentum descent ascent (Acc-MDA) method for solving the white-box minimax problems, and prove that it achieves the best known gradient complexity of $\tilde{O}(\kappa_y^3\epsilon^{-3})$ without large batches. Extensive experimental results on the black-box adversarial attack to deep neural networks (DNNs) and poisoning attack demonstrate the efficiency of our algorithms.

Via

Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

In the paper, we propose a class of efficient momentum-based policy gradient methods for the model-free reinforcement learning, which use adaptive learning rates and do not require any large batches. Specifically, we propose a fast important-sampling momentum-based policy gradient (IS-MBPG) method based on a new momentum-based variance reduced technique and the importance sampling technique. We also propose a fast Hessian-aided momentum-based policy gradient (HA-MBPG) method based on the momentum-based variance reduced technique and the Hessian-aided technique. Moreover, we prove that both the IS-MBPG and HA-MBPG methods reach the best known sample complexity of $O(\epsilon^{-3})$ for finding an $\epsilon$-stationary point of the non-concave performance function, which only require one trajectory at each iteration. In particular, we present a non-adaptive version of IS-MBPG method, i.e., IS-MBPG*, which also reaches the best known sample complexity of $O(\epsilon^{-3})$ without any large batches. In the experiments, we apply four benchmark tasks to demonstrate the effectiveness of our algorithms.

Via

Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

Zeroth-order (gradient-free) method is a class of powerful optimization tool for many machine learning problems because it only needs function values (not gradient) in the optimization. In particular, zeroth-order method is very suitable for many complex problems such as black-box attacks and bandit feedback, whose explicit gradients are difficult or infeasible to obtain. Recently, although many zeroth-order methods have been developed, these approaches still exist two main drawbacks: 1) high function query complexity; 2) not being well suitable for solving the problems with complex penalties and constraints. To address these challenging drawbacks, in this paper, we propose a novel fast zeroth-order stochastic alternating direction method of multipliers (ADMM) method (\emph{i.e.}, ZO-SPIDER-ADMM) with lower function query complexity for solving nonconvex problems with multiple nonsmooth penalties. Moreover, we prove that our ZO-SPIDER-ADMM has the optimal function query complexity of $O(dn + dn^{\frac{1}{2}}\epsilon^{-1})$ for finding an $\epsilon$-approximate local solution, where $n$ and $d$ denote the sample size and dimension of data, respectively. In particular, the ZO-SPIDER-ADMM improves the existing best nonconvex zeroth-order ADMM methods by a factor of $O(d^{\frac{1}{3}}n^{\frac{1}{6}})$. Moreover, we propose a fast online ZO-SPIDER-ADMM (\emph{i.e.,} ZOO-SPIDER-ADMM). Our theoretical analysis shows that the ZOO-SPIDER-ADMM has the function query complexity of $O(d\epsilon^{-\frac{3}{2}})$, which improves the existing best result by a factor of $O(\epsilon^{-\frac{1}{2}})$. Finally, we utilize a task of structured adversarial attack on black-box deep neural networks to demonstrate the efficiency of our algorithms.

Via

Feihu Huang, Shangqian Gao, Songcan Chen, Heng Huang

Alternating direction method of multipliers (ADMM) is a popular optimization tool for the composite and constrained problems in machine learning. However, in many machine learning problems such as black-box attacks and bandit feedback, ADMM could fail because the explicit gradients of these problems are difficult or infeasible to obtain. Zeroth-order (gradient-free) methods can effectively solve these problems due to that the objective function values are only required in the optimization. Recently, though there exist a few zeroth-order ADMM methods, they build on the convexity of objective function. Clearly, these existing zeroth-order methods are limited in many applications. In the paper, thus, we propose a class of fast zeroth-order stochastic ADMM methods (i.e., ZO-SVRG-ADMM and ZO-SAGA-ADMM) for solving nonconvex problems with multiple nonsmooth penalties, based on the coordinate smoothing gradient estimator. Moreover, we prove that both the ZO-SVRG-ADMM and ZO-SAGA-ADMM have convergence rate of $O(1/T)$, where $T$ denotes the number of iterations. In particular, our methods not only reach the best convergence rate $O(1/T)$ for the nonconvex optimization, but also are able to effectively solve many complex machine learning problems with multiple regularized penalties and constraints. Finally, we conduct the experiments of black-box binary classification and structured adversarial attack on black-box deep neural network to validate the efficiency of our algorithms.

Via