Link prediction -- the process of uncovering missing links in a complex network -- is an important problem in information sciences, with applications ranging from social sciences to molecular biology. Recent advances in neural graph embeddings have proposed an end-to-end way of learning latent vector representations of nodes, with successful application in link prediction tasks. Yet, our understanding of the internal mechanisms of such approaches has been rather limited, and only very recently we have witnessed the development of a very compelling connection to the mature matrix factorization theory. In this work, we make an important contribution to our understanding of the interplay between the skip-gram powered neural graph embedding algorithms and the matrix factorization via SVD. In particular, we show that the link prediction accuracy of graph embeddings strongly depends on the transformations of the original graph co-occurrence matrix that they decompose, sometimes resulting in staggering boosts of accuracy performance on link prediction tasks. Our improved approach to learning low-rank factorization embeddings that incorporate information from unlikely pairs of nodes yields results on par with the state-of-the-art link prediction performance achieved by a complex neural graph embedding model
Collaborative filtering (CF) has achieved great success in the field of recommender systems. In recent years, many novel CF models, particularly those based on deep learning or graph techniques, have been proposed for a variety of recommendation tasks, such as rating prediction and item ranking. These newly published models usually demonstrate their performance in comparison to baselines or existing models in terms of accuracy improvements. However, others have pointed out that many newly proposed models are not as strong as expected and are outperformed by very simple baselines. This paper proposes a simple linear model based on Matrix Factorization (MF), called UserReg, which regularizes users' latent representations with explicit feedback information for rating prediction. We compare the effectiveness of UserReg with three linear CF models that are widely-used as baselines, and with a set of recently proposed complex models that are based on deep learning or graph techniques. Experimental results show that UserReg achieves overall better performance than the fine-tuned baselines considered and is highly competitive when compared with other recently proposed models. We conclude that UserReg can be used as a strong baseline for future CF research.
FPN is a common component used in object detectors, it supplements multi-scale information by adjacent level features interpolation and summation. However, due to the existence of nonlinear operations and the convolutional layers with different output dimensions, the relationship between different levels is much more complex, the pixel-wise summation is not an efficient approach. In this paper, we first analyze the design defects from pixel level and feature map level. Then, we design a novel parameter-free feature pyramid networks named Dual Refinement Feature Pyramid Networks (DRFPN) for the problems. Specifically, DRFPN consists of two modules: Spatial Refinement Block (SRB) and Channel Refinement Block (CRB). SRB learns the location and content of sampling points based on contextual information between adjacent levels. CRB learns an adaptive channel merging method based on attention mechanism. Our proposed DRFPN can be easily plugged into existing FPN-based models. Without bells and whistles, for two-stage detectors, our model outperforms different FPN-based counterparts by 1.6 to 2.2 AP on the COCO detection benchmark, and 1.5 to 1.9 AP on the COCO segmentation benchmark. For one-stage detectors, DRFPN improves anchor-based RetinaNet by 1.9 AP and anchor-free FCOS by 1.3 AP when using ResNet50 as backbone. Extensive experiments verifies the robustness and generalization ability of DRFPN. The code will be made publicly available.
Traffic forecasting is important for the success of intelligent transportation systems. Deep learning models, including convolution neural networks and recurrent neural networks, have been extensively applied in traffic forecasting problems to model spatial and temporal dependencies. In recent years, to model the graph structures in transportation systems as well as contextual information, graph neural networks have been introduced and have achieved state-of-the-art performance in a series of traffic forecasting problems. In this survey, we review the rapidly growing body of research using different graph neural networks, e.g. graph convolutional and graph attention networks, in various traffic forecasting problems, e.g. road traffic flow and speed forecasting, passenger flow forecasting in urban rail transit systems, and demand forecasting in ride-hailing platforms. We also present a comprehensive list of open data and source resources for each problem and identify future research directions. To the best of our knowledge, this paper is the first comprehensive survey that explores the application of graph neural networks for traffic forecasting problems. We have also created a public GitHub repository where the latest papers, open data, and source resources will be updated.
Convolutional neural networks (CNNs) are widely used to recognize the user's state through electroencephalography (EEG) signals. In the previous studies, the EEG signals are usually fed into the CNNs in the form of high-dimensional raw data. However, this approach makes it difficult to exploit the brain connectivity information that can be effective in describing the functional brain network and estimating the perceptual state of the user. We introduce a new classification system that utilizes brain connectivity with a CNN and validate its effectiveness via the emotional video classification by using three different types of connectivity measures. Furthermore, two data-driven methods to construct the connectivity matrix are proposed to maximize classification performance. Further analysis reveals that the level of concentration of the brain connectivity related to the emotional property of the target video is correlated with classification performance.
Although it is widely known that Gaussian processes can be conditioned on observations of the gradient, this functionality is of limited use due to the prohibitive computational cost of $\mathcal{O}(N^3 D^3)$ in data points $N$ and dimension $D$. The dilemma of gradient observations is that a single one of them comes at the same cost as $D$ independent function evaluations, so the latter are often preferred. Careful scrutiny reveals, however, that derivative observations give rise to highly structured kernel Gram matrices for very general classes of kernels (inter alia, stationary kernels). We show that in the low-data regime $N<D$, the Gram matrix can be decomposed in a manner that reduces the cost of inference to $\mathcal{O}(N^2D + (N^2)^3)$ (i.e., linear in the number of dimensions) and, in special cases, to $\mathcal{O}(N^2D + N^3)$. This reduction in complexity opens up new use-cases for inference with gradients especially in the high-dimensional regime, where the information-to-cost ratio of gradient observations significantly increases. We demonstrate this potential in a variety of tasks relevant for machine learning, such as optimization and Hamiltonian Monte Carlo with predictive gradients.
Symmetric nonnegative matrix factorization (SNMF) has demonstrated to be a powerful method for data clustering. However, SNMF is mathematically formulated as a non-convex optimization problem, making it sensitive to the initialization of variables. Inspired by ensemble clustering that aims to seek a better clustering result from a set of clustering results, we propose self-supervised SNMF (S$^3$NMF), which is capable of boosting clustering performance progressively by taking advantage of the sensitivity to initialization characteristic of SNMF, without relying on any additional information. Specifically, we first perform SNMF repeatedly with a random nonnegative matrix for initialization each time, leading to multiple decomposed matrices. Then, we rank the quality of the resulting matrices with adaptively learned weights, from which a new similarity matrix that is expected to be more discriminative is reconstructed for SNMF again. These two steps are iterated until the stopping criterion/maximum number of iterations is achieved. We mathematically formulate S$^3$NMF as a constraint optimization problem, and provide an alternative optimization algorithm to solve it with the theoretical convergence guaranteed. Extensive experimental results on $10$ commonly used benchmark datasets demonstrate the significant advantage of our S$^3$NMF over $12$ state-of-the-art methods in terms of $5$ quantitative metrics. The source code is publicly available at https://github.com/jyh-learning/SSSNMF.
In multi-object tracking, the tracker maintains in its memory the appearance and motion information for each object in the scene. This memory is utilized for finding matches between tracks and detections and is updated based on the matching result. Many approaches model each target in isolation and lack the ability to use all the targets in the scene to jointly update the memory. This can be problematic when there are similar looking objects in the scene. In this paper, we solve the problem of simultaneously considering all tracks during memory updating, with only a small spatial overhead, via a novel multi-track pooling module. We additionally propose a training strategy adapted to multi-track pooling which generates hard tracking episodes online. We show that the combination of these innovations results in a strong discriminative appearance model, enabling the use of greedy data association to achieve online tracking performance. Our experiments demonstrate real-time, state-of-the-art performance on public multi-object tracking (MOT) datasets.
Performance of fingerprint recognition algorithms substantially rely on fine features extracted from fingerprints. Apart from minutiae and ridge patterns, pore features have proven to be usable for fingerprint recognition. Although features from minutiae and ridge patterns are quite attainable from low-resolution images, using pore features is practical only if the fingerprint image is of high resolution which necessitates a model that enhances the image quality of the conventional 500 ppi legacy fingerprints preserving the fine details. To find a solution for recovering pore information from low-resolution fingerprints, we adopt a joint learning-based approach that combines both super-resolution and pore detection networks. Our modified single image Super-Resolution Generative Adversarial Network (SRGAN) framework helps to reliably reconstruct high-resolution fingerprint samples from low-resolution ones assisting the pore detection network to identify pores with a high accuracy. The network jointly learns a distinctive feature representation from a real low-resolution fingerprint sample and successfully synthesizes a high-resolution sample from it. To add discriminative information and uniqueness for all the subjects, we have integrated features extracted from a deep fingerprint verifier with the SRGAN quality discriminator. We also add ridge reconstruction loss, utilizing ridge patterns to make the best use of extracted features. Our proposed method solves the recognition problem by improving the quality of fingerprint images. High recognition accuracy of the synthesized samples that is close to the accuracy achieved using the original high-resolution images validate the effectiveness of our proposed model.
Federated Learning is a new learning scheme for collaborative training a shared prediction model while keeping data locally on participating devices. In this paper, we study a new model of multiple federated learning services at the multi-access edge computing server. Accordingly, the sharing of CPU resources among learning services at each mobile device for the local training process and allocating communication resources among mobile devices for exchanging learning information must be considered. Furthermore, the convergence performance of different learning services depends on the hyper-learning rate parameter that needs to be precisely decided. Towards this end, we propose a joint resource optimization and hyper-learning rate control problem, namely MS-FEDL, regarding the energy consumption of mobile devices and overall learning time. We design a centralized algorithm based on the block coordinate descent method and a decentralized JP-miADMM algorithm for solving the MS-FEDL problem. Different from the centralized approach, the decentralized approach requires many iterations to obtain but it allows each learning service to independently manage the local resource and learning process without revealing the learning service information. Our simulation results demonstrate the convergence performance of our proposed algorithms and the superior performance of our proposed algorithms compared to the heuristic strategy.