Abstract:Recent studies have shown the great potential of diffusion models in improving reinforcement learning (RL) by modeling complex policies, expressing a high degree of multi-modality, and efficiently handling high-dimensional continuous control tasks. However, there is currently limited research on how to optimize diffusion-based polices (e.g., Diffusion Policy) fast and stably. In this paper, we propose an Adam-based Diffusion Policy Optimization (ADPO), a fast algorithmic framework containing best practices for fine-tuning diffusion-based polices in robotic control tasks using the adaptive gradient descent method in RL. Adaptive gradient method is less studied in training RL, let alone diffusion-based policies. We confirm that ADPO outperforms other diffusion-based RL methods in terms of overall effectiveness for fine-tuning on standard robotic tasks. Concretely, we conduct extensive experiments on standard robotic control tasks to test ADPO, where, particularly, six popular diffusion-based RL methods are provided as benchmark methods. Experimental results show that ADPO acquires better or comparable performance than the baseline methods. Finally, we systematically analyze the sensitivity of multiple hyperparameters in standard robotics tasks, providing guidance for subsequent practical applications. Our video demonstrations are released in https://github.com/Timeless-lab/ADPO.git.
Abstract:StochAstic Recursive grAdient algoritHm (SARAH), originally proposed for convex optimization and also proven to be effective for general nonconvex optimization, has received great attention due to its simple recursive framework for updating stochastic gradient estimates. The performance of SARAH significantly depends on the choice of step size sequence. However, SARAH and its variants often employ a best-tuned step size by mentor, which is time consuming in practice. Motivated by this gap, we proposed a variant of the Barzilai-Borwein (BB) method, referred to as the Random Barzilai-Borwein (RBB) method, to calculate step size for SARAH in the mini-batch setting, thereby leading to a new SARAH method: MB-SARAH-RBB. We prove that MB-SARAH-RBB converges linearly in expectation for strongly convex objective functions. We analyze the complexity of MB-SARAH-RBB and show that it is better than the original method. Numerical experiments on standard data sets indicate that MB-SARAH-RBB outperforms or matches state-of-the-art algorithms.
Abstract:The timely provision of traffic sign information to drivers is essential for the drivers to respond, to ensure safe driving, and to avoid traffic accidents in a timely manner. We proposed a timely visual recognizability quantitative evaluation method for traffic signs in large-scale transportation environments. To achieve this goal, we first address the concept of a visibility field to reflect the visible distribution of three-dimensional (3D) space and construct a traffic sign Visibility Evaluation Model (VEM) to measure the traffic sign visibility for a given viewpoint. Then, based on the VEM, we proposed the concept of the Visual Recognizability Field (VRF) to reflect the visual recognizability distribution in 3D space and established a Visual Recognizability Evaluation Model (VREM) to measure a traffic sign visual recognizability for a given viewpoint. Next, we proposed a Traffic Sign Timely Visual Recognizability Evaluation Model (TSTVREM) by combining VREM, the actual maximum continuous visual recognizable distance, and traffic big data to measure a traffic sign visual recognizability in different lanes. Finally, we presented an automatic algorithm to implement the TSTVREM model through traffic sign and road marking detection and classification, traffic sign environment point cloud segmentation, viewpoints calculation, and TSTVREM model realization. The performance of our method for traffic sign timely visual recognizability evaluation is tested on three road point clouds acquired by a mobile laser scanning system (RIEGL VMX-450) according to Road Traffic Signs and Markings (GB 5768-1999 in China), showing that our method is feasible and efficient.