In reinforcement learning (RL), we always expect the agent to explore as many states as possible in the initial stage of training and exploit the explored information in the subsequent stage to discover the most returnable trajectory. Based on this principle, in this paper, we soften the proximal policy optimization by introducing the entropy and dynamically setting the temperature coefficient to balance the opportunity of exploration and exploitation. While maximizing the expected reward, the agent will also seek other trajectories to avoid the local optimal policy. Nevertheless, the increase of randomness induced by entropy will reduce the train speed in the early stage. Integrating the temporal-difference (TD) method and the general advantage estimator (GAE), we propose the dual-track advantage estimator (DTAE) to accelerate the convergence of value functions and further enhance the performance of the algorithm. Compared with other on-policy RL algorithms on the Mujoco environment, the proposed method not only significantly speeds up the training but also achieves the most advanced results in cumulative return.
Dialogue systems in the open domain have achieved great success due to large conversation data and the development of deep learning, but multi-turn scenarios are still a challenge because of the frequent coreference and information omission. In this paper, we investigate the incomplete utterance restoration since it has brought general improvement over multi-turn dialogue systems in recent studies. Inspired by the autoregression for generation and the sequence labeling for text editing, we propose a novel semi autoregressive generator (SARG) with the high efficiency and flexibility. Moreover, experiments on Restoration-200k show that our proposed model significantly outperforms the state-of-the-art models with faster inference speed.
The task of room layout estimation is to locate the wall-floor, wall-ceiling, and wall-wall boundaries. Most recent methods solve this problem based on edge/keypoint detection or semantic segmentation. However, these approaches have shown limited attention on the geometry of the dominant planes and the intersection between them, which has significant impact on room layout. In this work, we propose to incorporate geometric reasoning to deep learning for layout estimation. Our approach learns to infer the depth maps of the dominant planes in the scene by predicting the pixel-level surface parameters, and the layout can be generated by the intersection of the depth maps. Moreover, we present a new dataset with pixel-level depth annotation of dominant planes. It is larger than the existing datasets and contains both cuboid and non-cuboid rooms. Experimental results show that our approach produces considerable performance gains on both 2D and 3D datasets.
Previous AutoML pruning works utilized individual layer features to automatically prune filters. We analyze the correlation for two layers from different blocks which have a short-cut structure. It is found that, in one block, the deeper layer has many redundant filters which can be represented by filters in the former layer so that it is necessary to take information from other layers into consideration in pruning. In this paper, a graph pruning approach is proposed, which views any deep model as a topology graph. Graph PruningNet based on the graph convolution network is designed to automatically extract neighboring information for each node. To extract features from various topologies, Graph PruningNet is connected with Pruned Network by an individual fully connection layer for each node and jointly trained on a training dataset from scratch. Thus, we can obtain reasonable weights for any size of sub-network. We then search the best configuration of the Pruned Network by reinforcement learning. Different from previous work, we take the node features from well-trained Graph PruningNet, instead of the hand-craft features, as the states in reinforcement learning. Compared with other AutoML pruning works, our method has achieved the state-of-the-art under same conditions on ImageNet-2012. The code will be released on GitHub.
In this paper, we present an autonomous unmanned aerial vehicle (UAV) landing system based on visual navigation. We design the landmark as a topological pattern in order to enable the UAV to distinguish the landmark from the environment easily. In addition, a dynamic thresholding method is developed for image binarization to improve detection efficiency. The relative distance in the horizontal plane is calculated according to effective image information, and the relative height is obtained using a linear interpolation method. The landing experiments are performed on a static and a moving platform, respectively. The experimental results illustrate that our proposed landing system performs robustly and accurately.
Semantic road region segmentation is a high-level task, which paves the way towards road scene understanding. This paper presents a residual network trained for semantic road segmentation. Firstly, we represent the projections of road disparities in the v-disparity map as a linear model, which can be estimated by optimizing the v-disparity map using dynamic programming. This linear model is then utilized to reduce the redundant information in the left and right road images. The right image is also transformed into the left perspective view, which greatly enhances the road surface similarity between the two images. Finally, the processed stereo images and their disparity maps are concatenated to create a set of 3D images, which are then utilized to train our neural network. The experimental results illustrate that our network achieves a maximum F1-measure of approximately 91.19% when analyzing the images from the KITTI road dataset.
Expressivity is one of the most significant issues in assessing neural networks. In this paper, we provide a quantitative analysis of the expressivity from dynamic models, where Hilbert space is employed to analyze its convergence and criticality. From the feature mapping of several widely used activation functions made by Hermite polynomials, We found sharp declines or even saddle points in the feature space, which stagnate the information transfer in deep neural networks, then present an activation function design based on the Hermite polynomials for better utilization of spatial representation. Moreover, we analyze the information transfer of deep neural networks, emphasizing the convergence problem caused by the mismatch between input and topological structure. We also study the effects of input perturbations and regularization operators on critical expressivity. Finally, we verified the proposed method by multivariate time series prediction. The results show that the optimized DeepESN provides higher predictive performance, especially for long-term prediction. Our theoretical analysis reveals that deep neural networks use spatial domains for information representation and evolve to the edge of chaos as depth increases. In actual training, whether a particular network can ultimately arrive that depends on its ability to overcome convergence and pass information to the required network depth.
Visual cognition of the indoor environment can benefit from the spatial layout estimation, which is to represent an indoor scene with a 2D box on a monocular image. In this paper, we propose to fully exploit the edge and semantic information of a room image for layout estimation. More specifically, we present an encoder-decoder network with shared encoder and two separate decoders, which are composed of multiple deconvolution (transposed convolution) layers, to jointly learn the edge maps and semantic labels of a room image. We combine these two network predictions in a scoring function to evaluate the quality of the layouts, which are generated by ray sampling and from a predefined layout pool. Guided by the scoring function, we apply a novel refinement strategy to further optimize the layout hypotheses. Experimental results show that the proposed network can yield accurate estimates of edge maps and semantic labels. By fully utilizing the two different types of labels, the proposed method achieves state-of-the-art layout estimation performance on benchmark datasets.