Negative sampling approaches are prevalent in implicit collaborative filtering for obtaining negative labels from massive unlabeled data. As two major concerns in negative sampling, efficiency and effectiveness are still not fully achieved by recent works that use complicate structures and overlook risk of false negative instances. In this paper, we first provide a novel understanding of negative instances by empirically observing that only a few instances are potentially important for model learning, and false negatives tend to have stable predictions over many training iterations. Above findings motivate us to simplify the model by sampling from designed memory that only stores a few important candidates and, more importantly, tackle the untouched false negative problem by favouring high-variance samples stored in memory, which achieves efficient sampling of true negatives with high-quality. Empirical results on two synthetic datasets and three real-world datasets demonstrate both robustness and superiorities of our negative sampling method.
Recent years have witnessed the popularity of Graph Neural Networks (GNN) in various scenarios. To obtain optimal data-specific GNN architectures, researchers turn to neural architecture search (NAS) methods, which have made impressive progress in discovering effective architectures in convolutional neural networks. Two preliminary works, GraphNAS and Auto-GNN, have made first attempt to apply NAS methods to GNN. Despite the promising results, there are several drawbacks in expressive capability and search efficiency of GraphNAS and Auto-GNN due to the designed search space. To overcome these drawbacks, we propose the SNAG framework (Simplified Neural Architecture search for Graph neural networks), consisting of a novel search space and a reinforcement learning based search algorithm. Extensive experiments on real-world datasets demonstrate the effectiveness of the SNAG framework compared to human-designed GNNs and NAS methods, including GraphNAS and Auto-GNN.
Matrix learning is at the core of many machine learning problems. To encourage a low-rank matrix solution, besides using the traditional convex nuclear norm regularizer, a popular recent trend is to use nonconvex regularizers that adaptively penalize singular values. They offer better recovery performance, but require computing the expensive singular value decomposition (SVD) in each iteration. To remove this bottleneck, we consider the "nuclear norm minus Frobenius norm" regularizer. Besides having nice theoretical properties on recovery and shrinkage as the other nonconvex regularizers, it can be reformulated into a factored form that can be efficiently optimized by gradient-based algorithms while avoiding the SVD computations altogether. Extensive low-rank matrix completion experiments on a number of synthetic and real-world data sets show that the proposed method obtains state-of-the-art recovery performance and is much more efficient than existing low-rank convex / nonconvex regularization and matrix factorization algorithms.
With the rapid development of knowledge bases (KBs), link prediction task, which completes KBs with missing facts, has been broadly studied in especially binary relational KBs (a.k.a knowledge graph) with powerful tensor decomposition related methods. However, the ubiquitous n-ary relational KBs with higher-arity relational facts are paid less attention, in which existing translation based and neural network based approaches have weak expressiveness and high complexity in modeling various relations. Tensor decomposition has not been considered for n-ary relational KBs, while directly extending tensor decomposition related methods of binary relational KBs to the n-ary case does not yield satisfactory results due to exponential model complexity and their strong assumptions on binary relations. To generalize tensor decomposition for n-ary relational KBs, in this work, we propose GETD, a generalized model based on Tucker decomposition and Tensor Ring decomposition. The existing negative sampling technique is also generalized to the n-ary case for GETD. In addition, we theoretically prove that GETD is fully expressive to completely represent any KBs. Extensive evaluations on two representative n-ary relational KB datasets demonstrate the superior performance of GETD, significantly improving the state-of-the-art methods by over 15\%. Moreover, GETD further obtains the state-of-the-art results on the benchmark binary relational KB datasets.
Scene text recognition (STR) is very challenging due to the diversity of text instances and the complexity of scenes. The community has paid increasing attention to boost the performance by improving the pre-processing image module, like rectification and deblurring, or the sequence translator. However, another critical module, i.e., the feature sequence extractor, has not been extensively explored. In this work, inspired by the success of neural architecture search (NAS), which can identify better architectures than human-designed ones, we propose automated STR (AutoSTR) to search data-dependent backbones to boost text recognition performance. First, we design a domain-specific search space for STR, which contains both choices on operations and constraints on the downsampling path. Then, we propose a two-step search algorithm, which decouples operations and downsampling path, for an efficient search in the given space. Experiments demonstrate that, by searching data-dependent backbones, AutoSTR can outperform the state-of-the-art approaches on standard benchmarks with much fewer FLOPS and model parameters.
Knowledge graph (KG) embedding is a fundamental problem in mining relational patterns. It aims to encode the entities and relations in KG into low dimensional vector space that can be used for subsequent algorithms. Lots of KG embedding models have been proposed to learn the interactions between entities and relations, which contain meaningful semantic information. However, structural information, which encodes local topology among entities, is also important to KG. In this work, we propose S2E to distill structural information and combine it with semantic information for different KGs as a neural architecture search (NAS) problem. First, we analyze the difficulty of using a unified model to solve the distillation problem. Based on it, we define the path distiller to recurrently combine structural and semantic information along relational paths, which are sampled to preserve both local topologies and semantics. Then, inspired by the recent success of NAS, we design a recurrent network-based search space for specific KG tasks and propose a natural gradient (NG) based search algorithm to update architectures. Experimental results demonstrate that the searched models by our proposed S2E outperform human-designed ones, and the NG based search algorithm is efficient compared with other NAS methods. Besides, our work is the first NAS method for RNN that can search architectures with better performance than human-designed models.
Sample-selection approaches, which attempt to pick up clean instances from the noisy training data set, have become one promising direction to robust learning from corrupted labels. These methods all build on the memorization effect, which means deep networks learn easy patterns first and then gradually over-fit the training data set. In this paper, we show how to properly select instances so that the training process can benefit the most from the memorization effect is a hard problem. Specifically, memorization can heavily depend on many factors, e.g., data set and network architecture. Nonetheless, there still exist general patterns of how memorization can occur. These facts motivate us to exploit memorization by automated machine learning (AutoML) techniques. First, we design an expressive but compact search space based on observed general patterns. Then, we propose to use the natural gradient-based search algorithm to efficiently search through space. Finally, extensive experiments on both synthetic data sets and benchmark data sets demonstrate that the proposed method can not only be much efficient than existing AutoML algorithms but can also achieve much better performance than the state-of-the-art approaches for learning from corrupted labels.
Interaction function (IFC), which captures interactions among items and users, is of great importance in collaborative filtering (CF). The inner product is the most popular IFC due to its success in low-rank matrix factorization. However, interactions in real-world applications can be highly complex. Many other operations (such as plus and concatenation) have also been proposed, and can possibly offer better performance than the inner product. In this paper, motivated by the success of automated machine learning, we propose to search for proper interaction functions (SIF) for CF tasks. We first design an expressive search space for SIF by reviewing and generalizing existing CF approaches. We then propose to represent the search space as a structured multi-layer perceptron, and design a stochastic gradient descent algorithm which can simultaneously update both architectures and learning parameters. Experimental results demonstrate that the proposed method can be much more efficient than popular AutoML approaches, and also obtain much better prediction performance than state-of-the-art CF approaches.