In this paper, we consider the Tensor Robust Principal Component Analysis (TRPCA) problem, which aims to exactly recover the low-rank and sparse components from their sum. Our model is based on the recently proposed tensor-tensor product (or t-product) [13]. Induced by the t-product, we first rigorously deduce the tensor spectral norm, tensor nuclear norm, and tensor average rank, and show that the tensor nuclear norm is the convex envelope of the tensor average rank within the unit ball of the tensor spectral norm. These definitions, their relationships and properties are consistent with matrix cases. Equipped with the new tensor nuclear norm, we then solve the TRPCA problem by solving a convex program and provide the theoretical guarantee for the exact recovery. Our TRPCA model and recovery guarantee include matrix RPCA as a special case. Numerical experiments verify our results, and the applications to image recovery and background modeling problems demonstrate the effectiveness of our method.
Improving information flow in deep networks helps to ease the training difficulties and utilize parameters more efficiently. Here we propose a new convolutional neural network architecture with alternately updated clique (CliqueNet). In contrast to prior networks, there are both forward and backward connections between any two layers in the same block. The layers are constructed as a loop and are updated alternately. The CliqueNet has some unique properties. For each layer, it is both the input and output of any other layer in the same block, so that the information flow among layers is maximized. During propagation, the newly updated layers are concatenated to re-update previously updated layer, and parameters are reused for multiple times. This recurrent feedback structure is able to bring higher level visual information back to refine low-level filters and achieve spatial attention. We analyze the features generated at different stages and observe that using refined features leads to a better result. We adopt a multi-scale feature strategy that effectively avoids the progressive growth of parameters. Experiments on image recognition datasets including CIFAR-10, CIFAR-100, SVHN and ImageNet show that our proposed models achieve the state-of-the-art performance with fewer parameters.
Asynchronous algorithms have attracted much attention recently due to the crucial demands on solving large-scale optimization problems. However, the accelerated versions of asynchronous algorithms are rarely studied. In this paper, we propose the "momentum compensation" technique to accelerate asynchronous algorithms for convex problems. Specifically, we first accelerate the plain Asynchronous Gradient Descent, which achieves a faster $O(1/\sqrt{\epsilon})$ (v.s. $O(1/\epsilon)$) convergence rate for non-strongly convex functions, and $O(\sqrt{\kappa}\log(1/\epsilon))$ (v.s. $O(\kappa \log(1/\epsilon))$) for strongly convex functions to reach an $\epsilon$- approximate minimizer with the condition number $\kappa$. We further apply the technique to accelerate modern stochastic asynchronous algorithms such as Asynchronous Stochastic Coordinate Descent and Asynchronous Stochastic Gradient Descent. Both of the resultant practical algorithms are faster than existing ones by order. To the best of our knowledge, we are the first to consider accelerated algorithms that allow updating by delayed gradients and are the first to propose truly accelerated asynchronous algorithms. Finally, the experimental results on a shared memory system show that acceleration can lead to significant performance gains on ill-conditioned problems.
Recurrent neural networks have achieved excellent performance in many applications. However, on portable devices with limited resources, the models are often too large to deploy. For applications on the server with large scale concurrent requests, the latency during inference can also be very critical for costly computing resources. In this work, we address these problems by quantizing the network, both weights and activations, into multiple binary codes {-1,+1}. We formulate the quantization as an optimization problem. Under the key observation that once the quantization coefficients are fixed the binary codes can be derived efficiently by binary search tree, alternating minimization is then applied. We test the quantization for two well-known RNNs, i.e., long short term memory (LSTM) and gated recurrent unit (GRU), on the language models. Compared with the full-precision counter part, by 2-bit quantization we can achieve ~16x memory saving and ~6x real inference acceleration on CPUs, with only a reasonable loss in the accuracy. By 3-bit quantization, we can achieve almost no loss in the accuracy or even surpass the original model, with ~10.5x memory saving and ~3x real inference acceleration. Both results beat the exiting quantization works with large margins. We extend our alternating quantization to image classification tasks. In both RNNs and feedforward neural networks, the method also achieves excellent performance.
Spectral Clustering (SC) is a widely used data clustering method which first learns a low-dimensional embedding $U$ of data by computing the eigenvectors of the normalized Laplacian matrix, and then performs k-means on $U^\top$ to get the final clustering result. The Sparse Spectral Clustering (SSC) method extends SC with a sparse regularization on $UU^\top$ by using the block diagonal structure prior of $UU^\top$ in the ideal case. However, encouraging $UU^\top$ to be sparse leads to a heavily nonconvex problem which is challenging to solve and the work (Lu, Yan, and Lin 2016) proposes a convex relaxation in the pursuit of this aim indirectly. However, the convex relaxation generally leads to a loose approximation and the quality of the solution is not clear. This work instead considers to solve the nonconvex formulation of SSC which directly encourages $UU^\top$ to be sparse. We propose an efficient Alternating Direction Method of Multipliers (ADMM) to solve the nonconvex SSC and provide the convergence guarantee. In particular, we prove that the sequences generated by ADMM always exist a limit point and any limit point is a stationary point. Our analysis does not impose any assumptions on the iterates and thus is practical. Our proposed ADMM for nonconvex problems allows the stepsize to be increasing but upper bounded, and this makes it very efficient in practice. Experimental analysis on several real data sets verifies the effectiveness of our method.
In the literature, most existing graph-based semi-supervised learning (SSL) methods only use the label information of observed samples in the label propagation stage, while ignoring such valuable information when learning the graph. In this paper, we argue that it is beneficial to consider the label information in the graph learning stage. Specifically, by enforcing the weight of edges between labeled samples of different classes to be zero, we explicitly incorporate the label information into the state-of-the-art graph learning methods, such as the Low-Rank Representation (LRR), and propose a novel semi-supervised graph learning method called Semi-Supervised Low-Rank Representation (SSLRR). This results in a convex optimization problem with linear constraints, which can be solved by the linearized alternating direction method. Though we take LRR as an example, our proposed method is in fact very general and can be applied to any self-representation graph learning methods. Experiment results on both synthetic and real datasets demonstrate that the proposed graph learning method can better capture the global geometric structure of the data, and therefore is more effective for semi-supervised learning tasks.
The Schatten-$p$ norm ($0<p<1$) has been widely used to replace the nuclear norm for better approximating the rank function. However, existing methods are either 1) not scalable for large scale problems due to relying on singular value decomposition (SVD) in every iteration, or 2) specific to some $p$ values, e.g., $1/2$, and $2/3$. In this paper, we show that for any $p$, $p_1$, and $p_2 >0$ satisfying $1/p=1/p_1+1/p_2$, there is an equivalence between the Schatten-$p$ norm of one matrix and the Schatten-$p_1$ and the Schatten-$p_2$ norms of its two factor matrices. We further extend the equivalence to multiple factor matrices and show that all the factor norms can be convex and smooth for any $p>0$. In contrast, the original Schatten-$p$ norm for $0<p<1$ is non-convex and non-smooth. As an example we conduct experiments on matrix completion. To utilize the convexity of the factor matrix norms, we adopt the accelerated proximal alternating linearized minimization algorithm and establish its sequence convergence. Experiments on both synthetic and real datasets exhibit its superior performance over the state-of-the-art methods. Its speed is also highly competitive.
A new submodule clustering method via sparse and low-rank representation for multi-way data is proposed in this paper. Instead of reshaping multi-way data into vectors, this method maintains their natural orders to preserve data intrinsic structures, e.g., image data kept as matrices. To implement clustering, the multi-way data, viewed as tensors, are represented by the proposed tensor sparse and low-rank model to obtain its submodule representation, called a free module, which is finally used for spectral clustering. The proposed method extends the conventional subspace clustering method based on sparse and low-rank representation to multi-way data submodule clustering by combining t-product operator. The new method is tested on several public datasets, including synthetical data, video sequences and toy images. The experiments show that the new method outperforms the state-of-the-art methods, such as Sparse Subspace Clustering (SSC), Low-Rank Representation (LRR), Ordered Subspace Clustering (OSC), Robust Latent Low Rank Representation (RobustLatLRR) and Sparse Submodule Clustering method (SSmC).
Tag-based image retrieval (TBIR) has drawn much attention in recent years due to the explosive amount of digital images and crowdsourcing tags. However, the TBIR applications still suffer from the deficient and inaccurate tags provided by users. Inspired by the subspace clustering methods, we formulate the tag completion problem in a subspace clustering model which assumes that images are sampled from subspaces, and complete the tags using the state-of-the-art Low Rank Representation (LRR) method. And we propose a matrix completion algorithm to further refine the tags. Our empirical results on multiple benchmark datasets for image annotation show that the proposed algorithm outperforms state-of-the-art approaches when handling missing and noisy tags.
Annotating images with tags is useful for indexing and retrieving images. However, many available annotation data include missing or inaccurate annotations. In this paper, we propose an image annotation framework which sequentially performs tag completion and refinement. We utilize the subspace property of data via sparse subspace clustering for tag completion. Then we propose a novel matrix completion model for tag refinement, integrating visual correlation, semantic correlation and the novelly studied property of complex errors. The proposed method outperforms the state-of-the-art approaches on multiple benchmark datasets even when they contain certain levels of annotation noise.