Modern recommender systems usually present items as one-dimensional ranking list. Recently there is a trend in e-commerce that the recommended items are organized as two-dimensional grid-based panels where users can view the items in both vertical and horizontal directions. Presenting items in grid-based result panels poses new challenges to recommender systems because existing models are all designed to output sequential lists while the slots in a grid-based panel have no explicit order. Directly converting the item rankings into grids (e.g., pre-defining an order on the slots)overlooks the user-specific behavioral patterns on grid-based pan-els and inevitably hurts the user experiences. To address this issue, we propose a novel Markov decision process (MDP) to place the items in 2D grid-based result panels at the final re-ranking stage of the recommender systems. The model, referred to as Panel-MDP, takes an initial item ranking from the early stages as the input. Then, it defines the MDP discrete time steps as the ranks in the initial ranking list, and the actions as the slots in the grid-based panels, plus a NULL action. At each time step, Panel-MDP sequentially takes an action of selecting one slot for placing an item of the initial ranking list, or discarding the item if NULL action is selected. The process is continued until all of the slots are filled. The reinforcement learning algorithm of DQN is employed to implement and learn the parameters in the Panel-MDP. Experiments on a dataset collected from a widely-used e-commerce app demonstrated the superiority ofPanel-MDP in terms of recommending 2D grid-based result panels.
In this work, we present a fully self-supervised framework for semantic segmentation(FS^4). A fully bootstrapped strategy for semantic segmentation, which saves efforts for the huge amount of annotation, is crucial for building customized models from end-to-end for open-world domains. This application is eagerly needed in realistic scenarios. Even though recent self-supervised semantic segmentation methods have gained great progress, these works however heavily depend on the fully-supervised pretrained model and make it impossible a fully self-supervised pipeline. To solve this problem, we proposed a bootstrapped training scheme for semantic segmentation, which fully leveraged the global semantic knowledge for self-supervision with our proposed PGG strategy and CAE module. In particular, we perform pixel clustering and assignments for segmentation supervision. Preventing it from clustering a mess, we proposed 1) a pyramid-global-guided (PGG) training strategy to supervise the learning with pyramid image/patch-level pseudo labels, which are generated by grouping the unsupervised features. The stable global and pyramid semantic pseudo labels can prevent the segmentation from learning too many clutter regions or degrading to one background region; 2) in addition, we proposed context-aware embedding (CAE) module to generate global feature embedding in view of its neighbors close both in space and appearance in a non-trivial way. We evaluate our method on the large-scale COCO-Stuff dataset and achieved 7.19 mIoU improvements on both things and stuff objects
As an important algorithm in deep reinforcement learning, advantage actor critic (A2C) has been widely succeeded in both discrete and continuous control tasks with raw pixel inputs, but its sample efficiency still needs to improve more. In traditional reinforcement learning, actor-critic algorithms generally use the recursive least squares (RLS) technology to update the parameter of linear function approximators for accelerating their convergence speed. However, A2C algorithms seldom use this technology to train deep neural networks (DNNs) for improving their sample efficiency. In this paper, we propose two novel RLS-based A2C algorithms and investigate their performance. Both proposed algorithms, called RLSSA2C and RLSNA2C, use the RLS method to train the critic network and the hidden layers of the actor network. The main difference between them is at the policy learning step. RLSSA2C uses an ordinary first-order gradient descent algorithm and the standard policy gradient to learn the policy parameter. RLSNA2C uses the Kronecker-factored approximation, the RLS method and the natural policy gradient to learn the compatible parameter and the policy parameter. In addition, we analyze the complexity and convergence of both algorithms, and present three tricks for further improving their convergence speed. Finally, we demonstrate the effectiveness of both algorithms on 40 games in the Atari 2600 environment and 11 tasks in the MuJoCo environment. From the experimental results, it is shown that our both algorithms have better sample efficiency than the vanilla A2C on most games or tasks, and have higher computational efficiency than other two state-of-the-art algorithms.
Convolutional neural networks (CNNs) have succeeded in many practical applications. However, their high computation and storage requirements often make them difficult to deploy on resource-constrained devices. In order to tackle this issue, many pruning algorithms have been proposed for CNNs, but most of them can't prune CNNs to a reasonable level. In this paper, we propose a novel algorithm for training and pruning CNNs based on the recursive least squares (RLS) optimization. After training a CNN for some epochs, our algorithm combines inverse input autocorrelation matrices and weight matrices to evaluate and prune unimportant input channels or nodes layer by layer. Then, our algorithm will continue to train the pruned network, and won't do the next pruning until the pruned network recovers the full performance of the old network. Besides for CNNs, the proposed algorithm can be used for feedforward neural networks (FNNs). Three experiments on MNIST, CIFAR-10 and SVHN datasets show that our algorithm can achieve the more reasonable pruning and have higher learning efficiency than other four popular pruning algorithms.
In this paper we explore partial coherence as a tool for evaluating causal influence of one signal sequence on another. In some cases the signal sequence is sampled from a time- or space-series. The key idea is to establish a connection between questions of causality and questions of partial coherence. Once this connection is established, then a scale-invariant partial coherence statistic is used to resolve the question of causality. This coherence statistic is shown to be a likelihood ratio, and its null distribution is shown to be a Wilks Lambda. It may be computed from a composite covariance matrix or from its inverse, the information matrix. Numerical experiments demonstrate the application of partial coherence to the resolution of causality. Importantly, the method is model-free, depending on no generative model for causality.
Consider a system that integrates positioning and single-user millimeter wave (mmWave) communication, where the communication part adopts wavelength division multiplexing (WDM) and orbital angular momentum (OAM). This paper addresses the power allocation and high dimensional constellation design in short-range line-of-sight (LOS) environment, with stable communication links. We propose a map-assisted method to replace online estimation, feedback and computation with the look-up table searching. We explore the possibility of using a few patterns in the maps, and investigate the performance loss of using the optimal solution of one position for other positions. For power allocation, we first characterize the performance loss outside the OAM beam regions with only plane waves, and figure out that the loss is always small. However, in OAM beam regions, the performance loss has similar characteristics only at some specific positions. Based on numerical results, we illustrate that a few patterns can be adopted for all receiver locations in the map. We also investigate high dimensional constellation design, and prove that a fixed constellation can be adopted for the positions where the channel matrices are sufficiently close to be proportional. Similarly, we figure out that the constellation design for all receiver locations can be represented by a few constellation sets.
Existing road pothole detection approaches can be classified as computer vision-based or machine learning-based. The former approaches typically employ 2-D image analysis/understanding or 3-D point cloud modeling and segmentation algorithms to detect road potholes from vision sensor data. The latter approaches generally address road pothole detection using convolutional neural networks (CNNs) in an end-to-end manner. However, road potholes are not necessarily ubiquitous and it is challenging to prepare a large well-annotated dataset for CNN training. In this regard, while computer vision-based methods were the mainstream research trend in the past decade, machine learning-based methods were merely discussed. Recently, we published the first stereo vision-based road pothole detection dataset and a novel disparity transformation algorithm, whereby the damaged and undamaged road areas can be highly distinguished. However, there are no benchmarks currently available for state-of-the-art (SoTA) CNNs trained using either disparity images or transformed disparity images. Therefore, in this paper, we first discuss the SoTA CNNs designed for semantic segmentation and evaluate their performance for road pothole detection with extensive experiments. Additionally, inspired by graph neural network (GNN), we propose a novel CNN layer, referred to as graph attention layer (GAL), which can be easily deployed in any existing CNN to optimize image feature representations for semantic segmentation. Our experiments compare GAL-DeepLabv3+, our best-performing implementation, with nine SoTA CNNs on three modalities of training data: RGB images, disparity images, and transformed disparity images. The experimental results suggest that our proposed GAL-DeepLabv3+ achieves the best overall pothole detection accuracy on all training data modalities.
This paper presents a neural network for robust normal estimation on point clouds, named AdaFit, that can deal with point clouds with noise and density variations. Existing works use a network to learn point-wise weights for weighted least squares surface fitting to estimate the normals, which has difficulty in finding accurate normals in complex regions or containing noisy points. By analyzing the step of weighted least squares surface fitting, we find that it is hard to determine the polynomial order of the fitting surface and the fitting surface is sensitive to outliers. To address these problems, we propose a simple yet effective solution that adds an additional offset prediction to improve the quality of normal estimation. Furthermore, in order to take advantage of points from different neighborhood sizes, a novel Cascaded Scale Aggregation layer is proposed to help the network predict more accurate point-wise offsets and weights. Extensive experiments demonstrate that AdaFit achieves state-of-the-art performance on both the synthetic PCPNet dataset and the real-word SceneNN dataset.
Salient object detection is the pixel-level dense prediction task which can highlight the prominent object in the scene. Recently U-Net framework is widely used, and continuous convolution and pooling operations generate multi-level features which are complementary with each other. In view of the more contribution of high-level features for the performance, we propose a triplet transformer embedding module to enhance them by learning long-range dependencies across layers. It is the first to use three transformer encoders with shared weights to enhance multi-level features. By further designing scale adjustment module to process the input, devising three-stream decoder to process the output and attaching depth features to color features for the multi-modal fusion, the proposed triplet transformer embedding network (TriTransNet) achieves the state-of-the-art performance in RGB-D salient object detection, and pushes the performance to a new level. Experimental results demonstrate the effectiveness of the proposed modules and the competition of TriTransNet.