Embedding learning has found widespread applications in recommendation systems and natural language modeling, among other domains. To learn quality embeddings efficiently, adaptive learning rate algorithms have demonstrated superior empirical performance over SGD, largely accredited to their token-dependent learning rate. However, the underlying mechanism for the efficiency of token-dependent learning rate remains underexplored. We show that incorporating frequency information of tokens in the embedding learning problems leads to provably efficient algorithms, and demonstrate that common adaptive algorithms implicitly exploit the frequency information to a large extent. Specifically, we propose (Counter-based) Frequency-aware Stochastic Gradient Descent, which applies a frequency-dependent learning rate for each token, and exhibits provable speed-up compared to SGD when the token distribution is imbalanced. Empirically, we show the proposed algorithms are able to improve or match adaptive algorithms on benchmark recommendation tasks and a large-scale industrial recommendation system, closing the performance gap between SGD and adaptive algorithms. Our results are the first to show token-dependent learning rate provably improves convergence for non-convex embedding learning problems.
Single image dehazing is a prerequisite which affects the performance of many computer vision tasks and has attracted increasing attention in recent years. However, most existing dehazing methods emphasize more on haze removal but less on the detail recovery of the dehazed images. In this paper, we propose a single image dehazing method with an independent Detail Recovery Network (DRN), which considers capturing the details from the input image over a separate network and then integrates them into a coarse dehazed image. The overall network consists of two independent networks, named DRN and the dehazing network respectively. Specifically, the DRN aims to recover the dehazed image details through local and global branches respectively. The local branch can obtain local detail information through the convolution layer and the global branch can capture more global information by the Smooth Dilated Convolution (SDC). The detail feature map is fused into the coarse dehazed image to obtain the dehazed image with rich image details. Besides, we integrate the DRN, the physical-model-based dehazing network and the reconstruction loss into an end-to-end joint learning framework. Extensive experiments on the public image dehazing datasets (RESIDE-Indoor, RESIDE-Outdoor and the TrainA-TestA) illustrate the effectiveness of the modules in the proposed method and show that our method outperforms the state-of-the-art dehazing methods both quantitatively and qualitatively. The code is released in https://github.com/YanLi-LY/Dehazing-DRN.
Motor imagery classification is of great significance to humans with mobility impairments, and how to extract and utilize the effective features from motor imagery electroencephalogram(EEG) channels has always been the focus of attention. There are many different methods for the motor imagery classification, but the limited understanding on human brain requires more effective methods for extracting the features of EEG data. Graph neural networks(GNNs) have demonstrated its effectiveness in classifying graph structures; and the use of GNN provides new possibilities for brain structure connection feature extraction. In this paper we propose a novel graph neural network based on the mutual information of the raw EEG channels called MutualGraphNet. We use the mutual information as the adjacency matrix combined with the spatial temporal graph convolution network(ST-GCN) could extract the transition rules of the motor imagery electroencephalogram(EEG) channels data more effectively. Experiments are conducted on motor imagery EEG data set and we compare our model with the current state-of-the-art approaches and the results suggest that MutualGraphNet is robust enough to learn the interpretable features and outperforms the current state-of-the-art methods.
Bregman proximal point algorithm (BPPA), as one of the centerpieces in the optimization toolbox, has been witnessing emerging applications. With simple and easy to implement update rule, the algorithm bears several compelling intuitions for empirical successes, yet rigorous justifications are still largely unexplored. We study the computational properties of BPPA through classification tasks with separable data, and demonstrate provable algorithmic regularization effects associated with BPPA. We show that BPPA attains non-trivial margin, which closely depends on the condition number of the distance generating function inducing the Bregman divergence. We further demonstrate that the dependence on the condition number is tight for a class of problems, thus showing the importance of divergence in affecting the quality of the obtained solutions. In addition, we extend our findings to mirror descent (MD), for which we establish similar connections between the margin and Bregman divergence. We demonstrate through a concrete example, and show BPPA/MD converges in direction to the maximal margin solution with respect to the Mahalanobis distance. Our theoretical findings are among the first to demonstrate the benign learning properties BPPA/MD, and also provide corroborations for a careful choice of divergence in the algorithmic design.
Recommending cold-start items is a long-standing and fundamental challenge in recommender systems. Without any historical interaction on cold-start items, CF scheme fails to use collaborative signals to infer user preference on these items. To solve this problem, extensive studies have been conducted to incorporate side information into the CF scheme. Specifically, they employ modern neural network techniques (e.g., dropout, consistency constraint) to discover and exploit the coalition effect of content features and collaborative representations. However, we argue that these works less explore the mutual dependencies between content features and collaborative representations and lack sufficient theoretical supports, thus resulting in unsatisfactory performance. In this work, we reformulate the cold-start item representation learning from an information-theoretic standpoint. It aims to maximize the mutual dependencies between item content and collaborative signals. Specifically, the representation learning is theoretically lower-bounded by the integration of two terms: mutual information between collaborative embeddings of users and items, and mutual information between collaborative embeddings and feature representations of items. To model such a learning process, we devise a new objective function founded upon contrastive learning and develop a simple yet effective Contrastive Learning-based Cold-start Recommendation framework(CLCRec). In particular, CLCRec consists of three components: contrastive pair organization, contrastive embedding, and contrastive optimization modules. It allows us to preserve collaborative signals in the content representations for both warm and cold-start items. Through extensive experiments on four publicly accessible datasets, we observe that CLCRec achieves significant improvements over state-of-the-art approaches in both warm- and cold-start scenarios.
Multi-agent reinforcement learning (MARL) becomes more challenging in the presence of more agents, as the capacity of the joint state and action spaces grows exponentially in the number of agents. To address such a challenge of scale, we identify a class of cooperative MARL problems with permutation invariance, and formulate it as a mean-field Markov decision processes (MDP). To exploit the permutation invariance therein, we propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture. We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence. Moreover, its sample complexity is independent of the number of agents. We validate the theoretical advantages of MF-PPO with numerical experiments in the multi-agent particle environment (MPE). In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors with a smaller number of model parameters, which is the key to its generalization performance.
In order to overcome the accommodation and convergence (A-C) conflict that commonly causes visual fatigue in AR display, we propose a Maxwellian-viewing-super-multi-view (MV-SMV) near-eye display system based on a Pancharatnam-Berry optical element (PBOE). The PBOE, which is constituted with an array of high-efficiency polarization gratings, is implemented to direct different views to different directions simultaneously, constructing the 3D light field. Meanwhile, each view is like a Maxwellian view display that possesses a small viewpoint and a large depth of field (DOF). Hence, the MV-SMV display can display virtual images with correct accommodation depth cue within a large DOF. We implement a proof-of-concept MV-SMV display prototype with 3 x 1 and 3 x 2 viewpoints using a 1D PBOE and a 2D PBOE, respectively, and achieve a DOF of 4.37 diopters experimentally.
Given an on-board diagnostics (OBD) dataset and a physics-based emissions prediction model, this paper aims to develop an accurate and computational-efficient AI (Artificial Intelligence) method that predicts vehicle emissions. The problem is of societal importance because vehicular emissions lead to climate change and impact human health. This problem is challenging because the OBD data does not contain enough parameters needed by high-order physics models. Conversely, related work has shown that low-order physics models have poor predictive accuracy when using available OBD data. This paper uses a divergent window co-occurrence pattern detection method to develop a spatiotemporal variability-aware AI model for predicting emission values from the OBD datasets. We conducted a case study using real-world OBD data from a local public transportation agency. Results show that the proposed AI method has approximately 65% improved predictive accuracy than a non-AI low-order physics model and is approximately 35% more accurate than a baseline model.