



Abstract:In this paper, a novel Multi-agent Reinforcement Learning (MARL) approach, Multi-Agent Continuous Dynamic Policy Gradient (MACDPP) was proposed to tackle the issues of limited capability and sample efficiency in various scenarios controlled by multiple agents. It alleviates the inconsistency of multiple agents' policy updates by introducing the relative entropy regularization to the Centralized Training with Decentralized Execution (CTDE) framework with the Actor-Critic (AC) structure. Evaluated by multi-agent cooperation and competition tasks and traditional control tasks including OpenAI benchmarks and robot arm manipulation, MACDPP demonstrates significant superiority in learning capability and sample efficiency compared with both related multi-agent and widely implemented signal-agent baselines and therefore expands the potential of MARL in effectively learning challenging control scenarios.




Abstract:This paper addresses the prediction stability, prediction accuracy and control capability of the current probabilistic model-based reinforcement learning (MBRL) built on neural networks. A novel approach dropout-based probabilistic ensembles with trajectory sampling (DPETS) is proposed where the system uncertainty is stably predicted by combining the Monte-Carlo dropout and trajectory sampling in one framework. Its loss function is designed to correct the fitting error of neural networks for more accurate prediction of probabilistic models. The state propagation in its policy is extended to filter the aleatoric uncertainty for superior control capability. Evaluated by several Mujoco benchmark control tasks under additional disturbances and one practical robot arm manipulation task, DPETS outperforms related MBRL approaches in both average return and convergence velocity while achieving superior performance than well-known model-free baselines with significant sample efficiency. The open source code of DPETS is available at https://github.com/mrjun123/DPETS.




Abstract:In this paper, we investigate the effectiveness of contrastive learning methods for predicting grasp outcomes in an unsupervised manner. By utilizing a publicly available dataset, we demonstrate that contrastive learning methods perform well on the task of grasp outcomes prediction. Specifically, the dynamic-dictionary-based method with the momentum updating technique achieves a satisfactory accuracy of 81.83% using data from one single tactile sensor, outperforming other unsupervised methods. Our results reveal the potential of contrastive learning methods for applications in the field of robot grasping and highlight the importance of accurate grasp prediction for achieving stable grasps.




Abstract:Large-scale distributed training is increasingly becoming communication bound. Many gradient compression algorithms have been proposed to reduce the communication overhead and improve scalability. However, it has been observed that in some cases gradient compression may even harm the performance of distributed training. In this paper, we propose MergeComp, a compression scheduler to optimize the scalability of communication-efficient distributed training. It automatically schedules the compression operations to optimize the performance of compression algorithms without the knowledge of model architectures or system parameters. We have applied MergeComp to nine popular compression algorithms. Our evaluations show that MergeComp can improve the performance of compression algorithms by up to 3.83x without losing accuracy. It can even achieve a scaling factor of distributed training up to 99% over high-speed networks.



Abstract:Mainly because of the heavy computational costs, the real-time whole-body obstacle avoidance for the redundant manipulators has not been well implemented. This paper presents an approach that can ensure that the whole-body of a redundant manipulator can avoid moving obstacles in real-time during the execution of a task. The manipulator is divided into end-effector and non-end-effector portion. Based on dynamical systems (DS), the real-time end-effector obstacle avoidance is obtained. Besides, the end-effector can reach the given target. By using null-space velocity control, the real-time non-endeffector obstacle avoidance is achieved. Finally, a controller is designed to ensure the whole-body obstacle avoidance. We validate the effectiveness of the method in the simulations and experiments on the 7-DOF arm of the UBTECH humanoid robot.




Abstract:In this paper, based on Dynamical Systems (DS), we present an obstacle avoidance method that take into account workspace constraint for serial manipulators. Two modulation matrices that consider the effect of an obstacle and the workspace of a manipulator are determined when the obstacle does not intersect the workspace boundary and when the obstacle intersects the workspace boundary respectively. Using the modulation matrices, an original DS is deformed. The proposed approach can ensure that the trajectory of the manipulator computed according to the deformed DS neither penetrate the obstacle nor go out of the workspace. We validate the effectiveness of the approach in the simulations and experiments on the left arm of the UBTECH humanoid robot.




Abstract:One of the fundamental challenges to scale self-driving is being able to create accurate high definition maps (HD maps) with low cost. Current attempts to automate this process typically focus on simple scenarios, estimate independent maps per frame or do not have the level of precision required by modern self driving vehicles. In contrast, in this paper we focus on drawing the lane boundaries of complex highways with many lanes that contain topology changes due to forks and merges. Towards this goal, we formulate the problem as inference in a directed acyclic graphical model (DAG), where the nodes of the graph encode geometric and topological properties of the local regions of the lane boundaries. Since we do not know a priori the topology of the lanes, we also infer the DAG topology (i.e., nodes and edges) for each region. We demonstrate the effectiveness of our approach on two major North American Highways in two different states and show high precision and recall as well as 89% correct topology.




Abstract:In this paper we propose a novel end-to-end learnable network that performs joint perception, prediction and motion planning for self-driving vehicles and produces interpretable intermediate representations. Unlike existing neural motion planners, our motion planning costs are consistent with our perception and prediction estimates. This is achieved by a novel differentiable semantic occupancy representation that is explicitly used as cost by the motion planning process. Our network is learned end-to-end from human demonstrations. The experiments in a large-scale manual-driving dataset and closed-loop simulation show that the proposed model significantly outperforms state-of-the-art planners in imitating the human behaviors while producing much safer trajectories.




Abstract:Emerging non-volatile memory (NVM), or memristive, devices promise energy-efficient realization of deep learning, when efficiently integrated with mixed-signal integrated circuits on a CMOS substrate. Even though several algorithmic challenges need to be addressed to turn the vision of memristive Neuromorphic Systems-on-a-Chip (NeuSoCs) into reality, issues at the device and circuit interface need immediate attention from the community. In this work, we perform energy-estimation of a NeuSoC system and predict the desirable circuit and device parameters for energy-efficiency optimization. Also, CMOS synapse circuits based on the concept of CMOS memristor emulator are presented as a system prototyping methodology, while practical memristor devices are being developed and integrated with general-purpose CMOS. The proposed mixed-signal memristive synapse can be designed and fabricated using standard CMOS technologies and open doors to interesting applications in cognitive computing circuits.




Abstract:Brain-inspired learning mechanisms, e.g. spike timing dependent plasticity (STDP), enable agile and fast on-the-fly adaptation capability in a spiking neural network. When incorporating emerging nanoscale resistive non-volatile memory (NVM) devices, with ultra-low power consumption and high-density integration capability, a spiking neural network hardware would result in several orders of magnitude reduction in energy consumption at a very small form factor and potentially herald autonomous learning machines. However, actual memory devices have shown to be intrinsically binary with stochastic switching, and thus impede the realization of ideal STDP with continuous analog values. In this work, a dendritic-inspired processing architecture is proposed in addition to novel CMOS neuron circuits. The utilization of spike attenuations and delays transforms the traditionally undesired stochastic behavior of binary NVMs into a useful leverage that enables biologically-plausible STDP learning. As a result, this work paves a pathway to adopt practical binary emerging NVM devices in brain-inspired neuromorphic computing.