Abstract:Trajectory prediction is a fundamental technology for advanced autonomous driving systems and represents one of the most challenging problems in the field of cognitive intelligence. Accurately predicting the future trajectories of each traffic participant is a prerequisite for building high safety and high reliability decision-making, planning, and control capabilities in autonomous driving. However, existing methods often focus solely on the motion of other traffic participants without considering the underlying intent behind that motion, which increases the uncertainty in trajectory prediction. Autonomous vehicles operate in real-time environments, meaning that trajectory prediction algorithms must be able to process data and generate predictions in real-time. While many existing methods achieve high accuracy, they often struggle to effectively handle heterogeneous traffic scenarios. In this paper, we propose a Subjective Intent-based Low-latency framework for Multiple traffic participants joint trajectory prediction. Our method explicitly incorporates the subjective intent of traffic participants based on their key points, and predicts the future trajectories jointly without map, which ensures promising performance while significantly reducing the prediction latency. Additionally, we introduce a novel dataset designed specifically for trajectory prediction. Related code and dataset will be available soon.
Abstract:Deep Reinforcement Learning (DRL) has emerged as a promising approach for handling highly dynamic and nonlinear Active Flow Control (AFC) problems. However, the computational cost associated with training DRL models presents a significant performance bottleneck. To address this challenge and enable efficient scaling on high-performance computing architectures, this study focuses on optimizing DRL-based algorithms in parallel settings. We validate an existing state-of-the-art DRL framework used for AFC problems and discuss its efficiency bottlenecks. Subsequently, by deconstructing the overall framework and conducting extensive scalability benchmarks for individual components, we investigate various hybrid parallelization configurations and propose efficient parallelization strategies. Moreover, we refine input/output (I/O) operations in multi-environment DRL training to tackle critical overhead associated with data movement. Finally, we demonstrate the optimized framework for a typical AFC problem where near-linear scaling can be obtained for the overall framework. We achieve a significant boost in parallel efficiency from around 49% to approximately 78%, and the training process is accelerated by approximately 47 times using 60 CPU cores. These findings are expected to provide valuable insights for further advancements in DRL-based AFC studies.