Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yangyang Xu

Deep Texture-Aware Features for Camouflaged Object Detection

Feb 05, 2021
Jingjing Ren, Xiaowei Hu, Lei Zhu, Xuemiao Xu, Yangyang Xu, Weiming Wang, Zijun Deng, Pheng-Ann Heng

Figure 1 for Deep Texture-Aware Features for Camouflaged Object Detection

Figure 2 for Deep Texture-Aware Features for Camouflaged Object Detection

Figure 3 for Deep Texture-Aware Features for Camouflaged Object Detection

Figure 4 for Deep Texture-Aware Features for Camouflaged Object Detection

Camouflaged object detection is a challenging task that aims to identify objects having similar texture to the surroundings. This paper presents to amplify the subtle texture difference between camouflaged objects and the background for camouflaged object detection by formulating multiple texture-aware refinement modules to learn the texture-aware features in a deep convolutional neural network. The texture-aware refinement module computes the covariance matrices of feature responses to extract the texture information, designs an affinity loss to learn a set of parameter maps that help to separate the texture between camouflaged objects and the background, and adopts a boundary-consistency loss to explore the object detail structures.We evaluate our network on the benchmark dataset for camouflaged object detection both qualitatively and quantitatively. Experimental results show that our approach outperforms various state-of-the-art methods by a large margin.

Via

Access Paper or Ask Questions

Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization

May 31, 2020
Yangyang Xu

Figure 1 for Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization

Figure 2 for Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization

Figure 3 for Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization

Figure 4 for Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization

Stochastic gradient methods (SGMs) have been extensively used for solving stochastic problems or large-scale machine learning problems. Recent works employ various techniques to improve the convergence rate of SGMs for both convex and nonconvex cases. Most of them require a large number of samples in some or all iterations of the improved SGMs. In this paper, we propose a new SGM, named PStorm, for solving nonconvex nonsmooth stochastic problems. With a momentum-based variance reduction technique, PStorm can achieve a near-optimal complexity result $\tilde{O}(\varepsilon^{-3})$ to produce a stochastic $\varepsilon$-stationary solution, if a mean-squared smoothness condition holds. Different from existing near-optimal methods, PStorm requires only one or $O(1)$ samples in every update. With this property, PStorm can be applied to online learning problems that favor real-time decisions based on one or $O(1)$ new observations. In addition, for large-scale machine learning problems, PStorm can generalize better by small-batch training than other near-optimal methods that require large-batch training and the vanilla SGM, as we demonstrate on training a sparse fully-connected neural network.

Via

Access Paper or Ask Questions

Katyusha Acceleration for Convex Finite-Sum Compositional Optimization

Oct 24, 2019
Yibo Xu, Yangyang Xu

Figure 1 for Katyusha Acceleration for Convex Finite-Sum Compositional Optimization

Figure 2 for Katyusha Acceleration for Convex Finite-Sum Compositional Optimization

Structured problems arise in many applications. To solve these problems, it is important to leverage the structure information. This paper focuses on convex problems with a finite-sum compositional structure. Finite-sum problems appear as the sample average approximation of a stochastic optimization problem and also arise in machine learning with a huge amount of training data. One popularly used numerical approach for finite-sum problems is the stochastic gradient method (SGM). However, the additional compositional structure prohibits easy access to unbiased stochastic approximation of the gradient, so directly applying the SGM to a finite-sum compositional optimization problem (COP) is often inefficient. We design new algorithms for solving strongly-convex and also convex two-level finite-sum COPs. Our design incorporates the Katyusha acceleration technique and adopts the mini-batch sampling from both outer-level and inner-level finite-sum. We first analyze the algorithm for strongly-convex finite-sum COPs. Similar to a few existing works, we obtain linear convergence rate in terms of the expected objective error, and from the convergence rate result, we then establish complexity results of the algorithm to produce an $\varepsilon$-solution. Our complexity results have the same dependence on the number of component functions as existing works. However, due to the use of Katyusha acceleration, our results have better dependence on the condition number $\kappa$ and improve to $\kappa^{2.5}$ from the best-known $\kappa^3$. Finally, we analyze the algorithm for convex finite-sum COPs, which uses as a subroutine the algorithm for strongly-convex finite-sum COPs. Again, we obtain better complexity results than existing works in terms of the dependence on $\varepsilon$, improving to $\varepsilon^{-2.5}$ from the best-known $\varepsilon^{-3}$.

Via

Access Paper or Ask Questions

Markov Chain Block Coordinate Descent

Nov 22, 2018
Tao Sun, Yuejiao Sun, Yangyang Xu, Wotao Yin

Figure 1 for Markov Chain Block Coordinate Descent

The method of block coordinate gradient descent (BCD) has been a powerful method for large-scale optimization. This paper considers the BCD method that successively updates a series of blocks selected according to a Markov chain. This kind of block selection is neither i.i.d. random nor cyclic. On the other hand, it is a natural choice for some applications in distributed optimization and Markov decision process, where i.i.d. random and cyclic selections are either infeasible or very expensive. By applying mixing-time properties of a Markov chain, we prove convergence of Markov chain BCD for minimizing Lipschitz differentiable functions, which can be nonconvex. When the functions are convex and strongly convex, we establish both sublinear and linear convergence rates, respectively. We also present a method of Markov chain inertial BCD. Finally, we discuss potential applications.

Via

Access Paper or Ask Questions

A Block Coordinate Ascent Algorithm for Mean-Variance Optimization

Nov 01, 2018
Bo Liu, Tengyang Xie, Yangyang Xu, Mohammad Ghavamzadeh, Yinlam Chow, Daoming Lyu, Daesub Yoon

Figure 1 for A Block Coordinate Ascent Algorithm for Mean-Variance Optimization

Figure 2 for A Block Coordinate Ascent Algorithm for Mean-Variance Optimization

Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare. The mean-variance function is one of the most widely used objective functions in risk management due to its simplicity and interpretability. Existing algorithms for mean-variance optimization are based on multi-time-scale stochastic approximation, whose learning rate schedules are often hard to tune, and have only asymptotic convergence proof. In this paper, we develop a model-free policy search framework for mean-variance optimization with finite-sample error bound analysis (to local optima). Our starting point is a reformulation of the original mean-variance function with its Fenchel dual, from which we propose a stochastic block coordinate ascent policy search algorithm. Both the asymptotic convergence guarantee of the last iteration's solution and the convergence rate of the randomly picked solution are provided, and their applicability is demonstrated on several benchmark domains.

* Accepted by NIPS 2018

Via

Access Paper or Ask Questions

Ensemble One-dimensional Convolution Neural Networks for Skeleton-based Action Recognition

Jan 13, 2018
Yangyang Xu, Lei Wang

Figure 1 for Ensemble One-dimensional Convolution Neural Networks for Skeleton-based Action Recognition

Figure 2 for Ensemble One-dimensional Convolution Neural Networks for Skeleton-based Action Recognition

Figure 3 for Ensemble One-dimensional Convolution Neural Networks for Skeleton-based Action Recognition

Figure 4 for Ensemble One-dimensional Convolution Neural Networks for Skeleton-based Action Recognition

In this paper, we proposed a effective but extensible residual one-dimensional convolution neural network as base network, based on the this network, we proposed four subnets to explore the features of skeleton sequences from each aspect. Given a skeleton sequences, the spatial information are encoded into the skeleton joints coordinate in a frame and the temporal information are present by multiple frames. Limited by the skeleton sequence representations, two-dimensional convolution neural network cannot be used directly, we chose one-dimensional convolution layer as the basic layer. Each sub network could extract discriminative features from different aspects. Our first subnet is a two-stream network which could explore both temporal and spatial information. The second is a body-parted network, which could gain micro spatial features and macro temporal features. The third one is an attention network, the main contribution of which is to focus the key frames and feature channels which high related with the action classes in a skeleton sequence. One frame-difference network, as the last subnet, mainly processes the joints changes between the consecutive frames. Four subnets ensemble together by late fusion, the key problem of ensemble method is each subnet should have a certain performance and between the subnets, there are diversity existing. Each subnet shares a wellperformance basenet and differences between subnets guaranteed the diversity. Experimental results show that the ensemble network gets a state-of-the-art performance on three widely used datasets.

* the title of Table 3 has something wrong and the expermient is not enough

Via

Access Paper or Ask Questions

Hybrid Jacobian and Gauss-Seidel proximal block coordinate update methods for linearly constrained convex programming

Jan 03, 2018
Yangyang Xu

Figure 1 for Hybrid Jacobian and Gauss-Seidel proximal block coordinate update methods for linearly constrained convex programming

Figure 2 for Hybrid Jacobian and Gauss-Seidel proximal block coordinate update methods for linearly constrained convex programming

Figure 3 for Hybrid Jacobian and Gauss-Seidel proximal block coordinate update methods for linearly constrained convex programming

Figure 4 for Hybrid Jacobian and Gauss-Seidel proximal block coordinate update methods for linearly constrained convex programming

Recent years have witnessed the rapid development of block coordinate update (BCU) methods, which are particularly suitable for problems involving large-sized data and/or variables. In optimization, BCU first appears as the coordinate descent method that works well for smooth problems or those with separable nonsmooth terms and/or separable constraints. As nonseparable constraints exist, BCU can be applied under primal-dual settings. In the literature, it has been shown that for weakly convex problems with nonseparable linear constraint, BCU with fully Gauss-Seidel updating rule may fail to converge and that with fully Jacobian rule can converge sublinearly. However, empirically the method with Jacobian update is usually slower than that with Gauss-Seidel rule. To maintain their advantages, we propose a hybrid Jacobian and Gauss-Seidel BCU method for solving linearly constrained multi-block structured convex programming, where the objective may have a nonseparable quadratic term and separable nonsmooth terms. At each primal block variable update, the method approximates the augmented Lagrangian function at an affine combination of the previous two iterates, and the affinely mixing matrix with desired nice properties can be chosen through solving a semidefinite programming. We show that the hybrid method enjoys the theoretical convergence guarantee as Jacobian BCU. In addition, we numerically demonstrate that the method can perform as well as Gauss-Seidel method and better than a recently proposed randomized primal-dual BCU method.

* Accepted in SIAM Journal on Optimization

Via

Access Paper or Ask Questions

Accelerated Primal-Dual Proximal Block Coordinate Updating Methods for Constrained Convex Optimization

Nov 20, 2017
Yangyang Xu, Shuzhong Zhang

Figure 1 for Accelerated Primal-Dual Proximal Block Coordinate Updating Methods for Constrained Convex Optimization

Figure 2 for Accelerated Primal-Dual Proximal Block Coordinate Updating Methods for Constrained Convex Optimization

Figure 3 for Accelerated Primal-Dual Proximal Block Coordinate Updating Methods for Constrained Convex Optimization

Figure 4 for Accelerated Primal-Dual Proximal Block Coordinate Updating Methods for Constrained Convex Optimization

Block Coordinate Update (BCU) methods enjoy low per-update computational complexity because every time only one or a few block variables would need to be updated among possibly a large number of blocks. They are also easily parallelized and thus have been particularly popular for solving problems involving large-scale dataset and/or variables. In this paper, we propose a primal-dual BCU method for solving linearly constrained convex program in multi-block variables. The method is an accelerated version of a primal-dual algorithm proposed by the authors, which applies randomization in selecting block variables to update and establishes an $O(1/t)$ convergence rate under weak convexity assumption. We show that the rate can be accelerated to $O(1/t^2)$ if the objective is strongly convex. In addition, if one block variable is independent of the others in the objective, we then show that the algorithm can be modified to achieve a linear rate of convergence. The numerical experiments show that the accelerated method performs stably with a single set of parameters while the original method needs to tune the parameters for different datasets in order to achieve a comparable level of performance.

* Accepted to Computational Optimization and Applications

Via

Access Paper or Ask Questions

On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays

Nov 15, 2017
Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin

Figure 1 for On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays

Figure 2 for On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays

Figure 3 for On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays

Figure 4 for On the Convergence of Asynchronous Parallel Iteration with Unbounded Delays

Recent years have witnessed the surge of asynchronous parallel (async-parallel) iterative algorithms due to problems involving very large-scale data and a large number of decision variables. Because of asynchrony, the iterates are computed with outdated information, and the age of the outdated information, which we call delay, is the number of times it has been updated since its creation. Almost all recent works prove convergence under the assumption of a finite maximum delay and set their stepsize parameters accordingly. However, the maximum delay is practically unknown. This paper presents convergence analysis of an async-parallel method from a probabilistic viewpoint, and it allows for large unbounded delays. An explicit formula of stepsize that guarantees convergence is given depending on delays' statistics. With $p+1$ identical processors, we empirically measured that delays closely follow the Poisson distribution with parameter $p$, matching our theoretical model, and thus the stepsize can be set accordingly. Simulations on both convex and nonconvex optimization problems demonstrate the validness of our analysis and also show that the existing maximum-delay induced stepsize is too conservative, often slowing down the convergence of the algorithm.

* accepted to JORSC

Via

Access Paper or Ask Questions

Asynchronous parallel primal-dual block update methods

May 18, 2017
Yangyang Xu

Figure 1 for Asynchronous parallel primal-dual block update methods

Figure 2 for Asynchronous parallel primal-dual block update methods

Figure 3 for Asynchronous parallel primal-dual block update methods

Figure 4 for Asynchronous parallel primal-dual block update methods

Recent several years have witnessed the surge of asynchronous (async-) parallel computing methods due to the extremely big data involved in many modern applications and also the advancement of multi-core machines and computer clusters. In optimization, most works about async-parallel methods are on unconstrained problems or those with block separable constraints. In this paper, we propose an async-parallel method based on block coordinate update (BCU) for solving convex problems with nonseparable linear constraint. Running on a single node, the method becomes a novel randomized primal-dual BCU with adaptive stepsize for multi-block affinely constrained problems. For these problems, Gauss-Seidel cyclic primal-dual BCU needs strong convexity to have convergence. On the contrary, merely assuming convexity, we show that the objective value sequence generated by the proposed algorithm converges in probability to the optimal value and also the constraint residual to zero. In addition, we establish an ergodic $O(1/k)$ convergence result, where $k$ is the number of iterations. Numerical experiments are performed to demonstrate the efficiency of the proposed method and significantly better speed-up performance than its sync-parallel counterpart.

Via

Access Paper or Ask Questions