the State Key Lab of Intelligent Control and Decision of Complex Systems and the School of Automation, Beijing Institute of Technology, Beijing, China, Beijing Institute of Technology Chongqing Innovation Center, Chongqing, China




Abstract:The integration of Autonomous Vehicles (AVs) into existing human-driven traffic systems poses considerable challenges, especially within environments where human and machine interactions are frequent and complex, such as at unsignalized intersections. Addressing these challenges, we introduce a novel framework predicated on dynamic and socially-aware decision-making game theory to augment the social decision-making prowess of AVs in mixed driving environments.This comprehensive framework is delineated into three primary modules: Social Tendency Recognition, Mixed-Strategy Game Modeling, and Expert Mode Learning. We introduce 'Interaction Orientation' as a metric to evaluate the social decision-making tendencies of various agents, incorporating both environmental factors and trajectory data. The mixed-strategy game model developed as part of this framework considers the evolution of future traffic scenarios and includes a utility function that balances safety, operational efficiency, and the unpredictability of environmental conditions. To adapt to real-world driving complexities, our framework utilizes dynamic optimization techniques for assimilating and learning from expert human driving strategies. These strategies are compiled into a comprehensive library, serving as a reference for future decision-making processes. Our approach is validated through extensive driving datasets, and the results demonstrate marked enhancements in decision timing, precision.




Abstract:This paper focuses on the multivariate time series imputation problem using deep neural architectures. The ubiquitous issue of missing data in both scientific and engineering tasks necessitates the development of an effective and general imputation model. Leveraging the wisdom and expertise garnered from low-rank imputation methods, we power the canonical Transformers with three key knowledge-driven enhancements, including projected temporal attention, global adaptive graph convolution, and Fourier imputation loss. These task-agnostic inductive biases exploit the inherent structures of incomplete time series, and thus make our model versatile for a variety of imputation problems. We demonstrate its superiority in terms of accuracy, efficiency, and flexibility on heterogeneous datasets, including traffic speed, traffic volume, solar energy, smart metering, and air quality. Comprehensive case studies are performed to further strengthen the interpretability. Promising empirical results provide strong conviction that incorporating time series primitives, such as low-rank properties, can substantially facilitate the development of a generalizable model to approach a wide range of spatiotemporal imputation problems.
Abstract:In this paper, a three-dimensional (3-D) non-stationary wideband multiple-input multiple-output (MIMO) channel model based on the WINNER+ channel model is proposed. The angular distributions of clusters in both the horizontal and vertical planes are jointly considered. The receiver and clusters can be moving, which makes the model more general. Parameters including number of clusters, powers, delays, azimuth angles of departure (AAoDs), azimuth angles of arrival (AAoAs), elevation angles of departure (EAoDs), and elevation angles of arrival (EAoAs) are time-variant. The cluster time evolution is modeled using a birth-death process. Statistical properties, including spatial cross-correlation function (CCF), temporal autocorrelation function (ACF), Doppler power spectrum density (PSD), level-crossing rate (LCR), average fading duration (AFD), and stationary interval are investigated and analyzed. The LCR, AFD, and stationary interval of the proposed channel model are validated against the measurement data. Numerical and simulation results show that the proposed channel model has the ability to reproduce the main properties of real non-stationary channels. Furthermore, the proposed channel model can be adapted to various communication scenarios by adjusting different parameter values.




Abstract:Conditional score-based diffusion model (SBDM) is for conditional generation of target data with paired data as condition, and has achieved great success in image translation. However, it requires the paired data as condition, and there would be insufficient paired data provided in real-world applications. To tackle the applications with partially paired or even unpaired dataset, we propose a novel Optimal Transport-guided Conditional Score-based diffusion model (OTCS) in this paper. We build the coupling relationship for the unpaired or partially paired dataset based on $L_2$-regularized unsupervised or semi-supervised optimal transport, respectively. Based on the coupling relationship, we develop the objective for training the conditional score-based model for unpaired or partially paired settings, which is based on a reformulation and generalization of the conditional SBDM for paired setting. With the estimated coupling relationship, we effectively train the conditional score-based model by designing a ``resampling-by-compatibility'' strategy to choose the sampled data with high compatibility as guidance. Extensive experiments on unpaired super-resolution and semi-paired image-to-image translation demonstrated the effectiveness of the proposed OTCS model. From the viewpoint of optimal transport, OTCS provides an approach to transport data across distributions, which is a challenge for OT on large-scale datasets. We theoretically prove that OTCS realizes the data transport in OT with a theoretical bound. Code is available at \url{https://github.com/XJTU-XGU/OTCS}.
Abstract:Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments. These approaches begin by constructing a parameterized simulation world model of the real environment through self-supervised learning. By leveraging the imagination of the world model, the agent's policy is enhanced without the constraints of sampling from the real environment. The performance of these algorithms heavily relies on the sequence modeling and generation capabilities of the world model. However, constructing a perfectly accurate model of a complex unknown environment is nearly impossible. Discrepancies between the model and reality may cause the agent to pursue virtual goals, resulting in subpar performance in the real environment. Introducing random noise into model-based reinforcement learning has been proven beneficial. In this work, we introduce Stochastic Transformer-based wORld Model (STORM), an efficient world model architecture that combines the strong sequence modeling and generation capabilities of Transformers with the stochastic nature of variational autoencoders. STORM achieves a mean human performance of $126.7\%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods that do not employ lookahead search techniques. Moreover, training an agent with $1.85$ hours of real-time interaction experience on a single NVIDIA GeForce RTX 3090 graphics card requires only $4.3$ hours, showcasing improved efficiency compared to previous methodologies.
Abstract:Purpose: To identify ocular hypertension (OHT) subtypes with different trends of visual field (VF) progression based on unsupervised machine learning and to discover factors associated with fast VF progression. Participants: A total of 3133 eyes of 1568 ocular hypertension treatment study (OHTS) participants with at least five follow-up VF tests were included in the study. Methods: We used a latent class mixed model (LCMM) to identify OHT subtypes using standard automated perimetry (SAP) mean deviation (MD) trajectories. We characterized the subtypes based on demographic, clinical, ocular, and VF factors at the baseline. We then identified factors driving fast VF progression using generalized estimating equation (GEE) and justified findings qualitatively and quantitatively. Results: The LCMM model discovered four clusters (subtypes) of eyes with different trajectories of MD worsening. The number of eyes in clusters were 794 (25%), 1675 (54%), 531 (17%) and 133 (4%). We labelled the clusters as Improvers, Stables, Slow progressors, and Fast progressors based on their mean of MD decline, which were 0.08, -0.06, -0.21, and -0.45 dB/year, respectively. Eyes with fast VF progression had higher baseline age, intraocular pressure (IOP), pattern standard deviation (PSD) and refractive error (RE), but lower central corneal thickness (CCT). Fast progression was associated with calcium channel blockers, being male, heart disease history, diabetes history, African American race, stroke history, and migraine headaches.




Abstract:The advent of autonomous vehicles (AVs) alongside human-driven vehicles (HVs) has ushered in an era of mixed traffic flow, presenting a significant challenge: the intricate interaction between these entities within complex driving environments. AVs are expected to have human-like driving behavior to seamlessly integrate into human-dominated traffic systems. To address this issue, we propose a reinforcement learning framework that considers driving priors and Social Coordination Awareness (SCA) to optimize the behavior of AVs. The framework integrates a driving prior learning (DPL) model based on a variational autoencoder to infer the driver's driving priors from human drivers' trajectories. A policy network based on a multi-head attention mechanism is designed to effectively capture the interactive dependencies between AVs and other traffic participants to improve decision-making quality. The introduction of SCA into the autonomous driving decision-making system, and the use of Coordination Tendency (CT) to quantify the willingness of AVs to coordinate the traffic system is explored. Simulation results show that the proposed framework can not only improve the decision-making quality of AVs but also motivate them to produce social behaviors, with potential benefits for the safety and traffic efficiency of the entire transportation system.




Abstract:Generating 3D faces from textual descriptions has a multitude of applications, such as gaming, movie, and robotics. Recent progresses have demonstrated the success of unconditional 3D face generation and text-to-3D shape generation. However, due to the limited text-3D face data pairs, text-driven 3D face generation remains an open problem. In this paper, we propose a text-guided 3D faces generation method, refer as TG-3DFace, for generating realistic 3D faces using text guidance. Specifically, we adopt an unconditional 3D face generation framework and equip it with text conditions, which learns the text-guided 3D face generation with only text-2D face data. On top of that, we propose two text-to-face cross-modal alignment techniques, including the global contrastive learning and the fine-grained alignment module, to facilitate high semantic consistency between generated 3D faces and input texts. Besides, we present directional classifier guidance during the inference process, which encourages creativity for out-of-domain generations. Compared to the existing methods, TG-3DFace creates more realistic and aesthetically pleasing 3D faces, boosting 9% multi-view consistency (MVIC) over Latent3D. The rendered face images generated by TG-3DFace achieve higher FID and CLIP score than text-to-2D face/image generation models, demonstrating our superiority in generating realistic and semantic-consistent textures.
Abstract:Discrete reinforcement learning (RL) algorithms have demonstrated exceptional performance in solving sequential decision tasks with discrete action spaces, such as Atari games. However, their effectiveness is hindered when applied to continuous control problems due to the challenge of dimensional explosion. In this paper, we present the Soft Decomposed Policy-Critic (SDPC) architecture, which combines soft RL and actor-critic techniques with discrete RL methods to overcome this limitation. SDPC discretizes each action dimension independently and employs a shared critic network to maximize the soft $Q$-function. This novel approach enables SDPC to support two types of policies: decomposed actors that lead to the Soft Decomposed Actor-Critic (SDAC) algorithm, and decomposed $Q$-networks that generate Boltzmann soft exploration policies, resulting in the Soft Decomposed-Critic Q (SDCQ) algorithm. Through extensive experiments, we demonstrate that our proposed approach outperforms state-of-the-art continuous RL algorithms in a variety of continuous control tasks, including Mujoco's Humanoid and Box2d's BipedalWalker. These empirical results validate the effectiveness of the SDPC architecture in addressing the challenges associated with continuous control.




Abstract:Autonomous driving technology is poised to transform transportation systems. However, achieving safe and accurate multi-task decision-making in complex scenarios, such as unsignalized intersections, remains a challenge for autonomous vehicles. This paper presents a novel approach to this issue with the development of a Multi-Task Decision-Making Generative Pre-trained Transformer (MTD-GPT) model. Leveraging the inherent strengths of reinforcement learning (RL) and the sophisticated sequence modeling capabilities of the Generative Pre-trained Transformer (GPT), the MTD-GPT model is designed to simultaneously manage multiple driving tasks, such as left turns, straight-ahead driving, and right turns at unsignalized intersections. We initially train a single-task RL expert model, sample expert data in the environment, and subsequently utilize a mixed multi-task dataset for offline GPT training. This approach abstracts the multi-task decision-making problem in autonomous driving as a sequence modeling task. The MTD-GPT model is trained and evaluated across several decision-making tasks, demonstrating performance that is either superior or comparable to that of state-of-the-art single-task decision-making models.