The DANE algorithm is an approximate Newton method popularly used for communication-efficient distributed machine learning. Reasons for the interest in DANE include scalability and versatility. Convergence of DANE, however, can be tricky; its appealing convergence rate is only rigorous for quadratic objective, and for more general convex functions the known results are no stronger than those of the classic first-order methods. To remedy these drawbacks, we propose in this paper some new alternatives of DANE which are more suitable for analysis. We first introduce a simple variant of DANE equipped with backtracking line search, for which global asymptotic convergence and sharper local non-asymptotic convergence rate guarantees can be proved for both quadratic and non-quadratic strongly convex functions. Then we propose a heavy-ball method to accelerate the convergence of DANE, showing that nearly tight local rate of convergence can be established for strongly convex functions, and with proper modification of algorithm the same result applies globally to linear prediction models. Numerical evidence is provided to confirm the theoretical and practical advantages of our methods.
A tacit assumption in linear regression is that (response, predictor)-pairs correspond to identical observational units. A series of recent works have studied scenarios in which this assumption is violated under terms such as ``Unlabeled Sensing and ``Regression with Unknown Permutation''. In this paper, we study the setup of multiple response variables and a notion of mismatches that generalizes permutations in order to allow for missing matches as well as for one-to-many matches. A two-stage method is proposed under the assumption that most pairs are correctly matched. In the first stage, the regression parameter is estimated by handling mismatches as contaminations, and subsequently the generalized permutation is estimated by a basic variant of matching. The approach is both computationally convenient and equipped with favorable statistical guarantees. Specifically, it is shown that the conditions for permutation recovery become considerably less stringent as the number of responses $m$ per observation increase. Particularly, for $m = \Omega(\log n)$, the required signal-to-noise ratio does no longer depend on the sample size $n$. Numerical results on synthetic and real data are presented to support the main findings of our analysis.
In this paper, we consider the problem of open information extraction (OIE) for extracting entity and relation level intermediate structures from sentences in open-domain. We focus on four types of valuable intermediate structures (Relation, Attribute, Description, and Concept), and propose a unified knowledge expression form, SAOKE, to express them. We publicly release a data set which contains more than forty thousand sentences and the corresponding facts in the SAOKE format labeled by crowd-sourcing. To our knowledge, this is the largest publicly available human labeled data set for open information extraction tasks. Using this labeled SAOKE data set, we train an end-to-end neural model using the sequenceto-sequence paradigm, called Logician, to transform sentences into facts. For each sentence, different to existing algorithms which generally focus on extracting each single fact without concerning other possible facts, Logician performs a global optimization over all possible involved facts, in which facts not only compete with each other to attract the attention of words, but also cooperate to share words. An experimental study on various types of open domain relation extraction tasks reveals the consistent superiority of Logician to other states-of-the-art algorithms. The experiments verify the reasonableness of SAOKE format, the valuableness of SAOKE data set, the effectiveness of the proposed Logician model, and the feasibility of the methodology to apply end-to-end learning paradigm on supervised data sets for the challenging tasks of open information extraction.
In this paper, we present a novel unsupervised video summarization model that requires no manual annotation. The proposed model termed Cycle-SUM adopts a new cycle-consistent adversarial LSTM architecture that can effectively maximize the information preserving and compactness of the summary video. It consists of a frame selector and a cycle-consistent learning based evaluator. The selector is a bi-direction LSTM network that learns video representations that embed the long-range relationships among video frames. The evaluator defines a learnable information preserving metric between original video and summary video and "supervises" the selector to identify the most informative frames to form the summary video. In particular, the evaluator is composed of two generative adversarial networks (GANs), in which the forward GAN is learned to reconstruct original video from summary video while the backward GAN learns to invert the processing. The consistency between the output of such cycle learning is adopted as the information preserving metric for video summarization. We demonstrate the close relation between mutual information maximization and such cycle learning procedure. Experiments on two video summarization benchmark datasets validate the state-of-the-art performance and superiority of the Cycle-SUM model over previous baselines.
We consider a new variant of \textsc{AMSGrad}. AMSGrad \cite{RKK18} is a popular adaptive gradient based optimization algorithm that is widely used in training deep neural networks. Our new variant of the algorithm assumes that mini-batch gradients in consecutive iterations have some underlying structure, which makes the gradients sequentially predictable. By exploiting the predictability and some ideas from the field of \textsc{Optimistic Online learning}, the new algorithm can accelerate the convergence and enjoy a tighter regret bound. We conduct experiments on training various neural networks on several datasets to show that the proposed method speeds up the convergence in practice.
This paper proposed a novel RGB-D SLAM method for dynamic environments. It follows traditional feature-based SLAM methods and utilizes a feature groups segmentation method to resist the disturbance caused by the dynamic objects using points correlations. The correlations between map points represented with a sparse graph are created by Delaunay triangulation. After removing non-consistency connections, the dynamic objects are separated from static background. The features only in the static map are used for motion estimation and bundle adjustment which improves the accuracy and robustness of SLAM in dynamic environments. The effectiveness of the proposed SLAM are evaluated using TUM RGB-D benchmark. The experiments demonstrate that the dynamic features are successfully removed and the system work perfectly in both low and high dynamic environments. The comparisons between proposed method and state-of-the-art visual systems clearly show that the comparable accurate results are achieved in low dynamic environments and the performance is improved significantly in high dynamic environments.
This paper is concerned with the hard thresholding operator which sets all but the $k$ largest absolute elements of a vector to zero. We establish a {\em tight} bound to quantitatively characterize the deviation of the thresholded solution from a given signal. Our theoretical result is universal in the sense that it holds for all choices of parameters, and the underlying analysis depends only on fundamental arguments in mathematical optimization. We discuss the implications for two domains: Compressed Sensing. On account of the crucial estimate, we bridge the connection between the restricted isometry property (RIP) and the sparsity parameter for a vast volume of hard thresholding based algorithms, which renders an improvement on the RIP condition especially when the true sparsity is unknown. This suggests that in essence, many more kinds of sensing matrices or fewer measurements are admissible for the data acquisition procedure. Machine Learning. In terms of large-scale machine learning, a significant yet challenging problem is learning accurate sparse models in an efficient manner. In stark contrast to prior work that attempted the $\ell_1$-relaxation for promoting sparsity, we present a novel stochastic algorithm which performs hard thresholding in each iteration, hence ensuring such parsimonious solutions. Equipped with the developed bound, we prove the {\em global linear convergence} for a number of prevalent statistical models under mild assumptions, even though the problem turns out to be non-convex.
While tree methods have been popular in practice, researchers and practitioners are also looking for simple algorithms which can reach similar accuracy of trees. In 2010, (Ping Li UAI'10) developed the method of "abc-robust-logitboost" and compared it with other supervised learning methods on datasets used by the deep learning literature. In this study, we propose a series of "tunable GMM kernels" which are simple and perform largely comparably to tree methods on the same datasets. Note that "abc-robust-logitboost" substantially improved the original "GDBT" in that (a) it developed a tree-split formula based on second-order information of the derivatives of the loss function; (b) it developed a new set of derivatives for multi-class classification formulation. In the prior study in 2017, the "generalized min-max" (GMM) kernel was shown to have good performance compared to the "radial-basis function" (RBF) kernel. However, as demonstrated in this paper, the original GMM kernel is often not as competitive as tree methods on the datasets used in the deep learning literature. Since the original GMM kernel has no parameters, we propose tunable GMM kernels by adding tuning parameters in various ways. Three basic (i.e., with only one parameter) GMM kernels are the "$e$GMM kernel", "$p$GMM kernel", and "$\gamma$GMM kernel", respectively. Extensive experiments show that they are able to produce good results for a large number of classification tasks. Furthermore, the basic kernels can be combined to boost the performance.
The method of 1-bit ("sign-sign") random projections has been a popular tool for efficient search and machine learning on large datasets. Given two $D$-dim data vectors $u$, $v\in\mathbb{R}^D$, one can generate $x = \sum_{i=1}^D u_i r_i$, and $y = \sum_{i=1}^D v_i r_i$, where $r_i\sim N(0,1)$ iid. The "collision probability" is ${Pr}\left(sgn(x)=sgn(y)\right) = 1-\frac{\cos^{-1}\rho}{\pi}$, where $\rho = \rho(u,v)$ is the cosine similarity. We develop "sign-full" random projections by estimating $\rho$ from (e.g.,) the expectation $E(sgn(x)y)=\sqrt{\frac{2}{\pi}} \rho$, which can be further substantially improved by normalizing $y$. For nonnegative data, we recommend an interesting estimator based on $E\left(y_- 1_{x\geq 0} + y_+ 1_{x<0}\right)$ and its normalized version. The recommended estimator almost matches the accuracy of the (computationally expensive) maximum likelihood estimator. At high similarity ($\rho\rightarrow1$), the asymptotic variance of recommended estimator is only $\frac{4}{3\pi} \approx 0.4$ of the estimator for sign-sign projections. At small $k$ and high similarity, the improvement would be even much more substantial.
Image matting is an important vision problem. The main stream methods for it combine sampling-based methods and propagation-based methods. In this paper, we deal with the combination with a normalized weighting parameter, which could well control the relative relationship between information from sampling and from propagation. A reasonable value range for this parameter is given based on statistics from the standard benchmark dataset. The matting is further improved by introducing semi-supervised learning iterations, which automatically refine the trimap without user's interaction. This is especially beneficial when the trimap is coarse. The experimental results on standard benchmark dataset have shown that both the normalized weighting parameter and the semi-supervised learning iteration could significantly improve the matting performance.