Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dongsheng Yang

ME$^3$-BEV: Mamba-Enhanced Deep Reinforcement Learning for End-to-End Autonomous Driving with BEV-Perception

Aug 08, 2025

Siyi Lu, Run Liu, Dongsheng Yang, Lei He

Figure 1 for ME$^3$-BEV: Mamba-Enhanced Deep Reinforcement Learning for End-to-End Autonomous Driving with BEV-Perception

Figure 2 for ME$^3$-BEV: Mamba-Enhanced Deep Reinforcement Learning for End-to-End Autonomous Driving with BEV-Perception

Figure 3 for ME$^3$-BEV: Mamba-Enhanced Deep Reinforcement Learning for End-to-End Autonomous Driving with BEV-Perception

Figure 4 for ME$^3$-BEV: Mamba-Enhanced Deep Reinforcement Learning for End-to-End Autonomous Driving with BEV-Perception

Abstract:Autonomous driving systems face significant challenges in perceiving complex environments and making real-time decisions. Traditional modular approaches, while offering interpretability, suffer from error propagation and coordination issues, whereas end-to-end learning systems can simplify the design but face computational bottlenecks. This paper presents a novel approach to autonomous driving using deep reinforcement learning (DRL) that integrates bird's-eye view (BEV) perception for enhanced real-time decision-making. We introduce the \texttt{Mamba-BEV} model, an efficient spatio-temporal feature extraction network that combines BEV-based perception with the Mamba framework for temporal feature modeling. This integration allows the system to encode vehicle surroundings and road features in a unified coordinate system and accurately model long-range dependencies. Building on this, we propose the \texttt{ME$^3$-BEV} framework, which utilizes the \texttt{Mamba-BEV} model as a feature input for end-to-end DRL, achieving superior performance in dynamic urban driving scenarios. We further enhance the interpretability of the model by visualizing high-dimensional features through semantic segmentation, providing insight into the learned representations. Extensive experiments on the CARLA simulator demonstrate that \texttt{ME$^3$-BEV} outperforms existing models across multiple metrics, including collision rate and trajectory accuracy, offering a promising solution for real-time autonomous driving.

Via

Access Paper or Ask Questions

DOPE: Dual Object Perception-Enhancement Network for Vision-and-Language Navigation

Apr 30, 2025

Yinfeng Yu, Dongsheng Yang

Figure 1 for DOPE: Dual Object Perception-Enhancement Network for Vision-and-Language Navigation

Figure 2 for DOPE: Dual Object Perception-Enhancement Network for Vision-and-Language Navigation

Figure 3 for DOPE: Dual Object Perception-Enhancement Network for Vision-and-Language Navigation

Figure 4 for DOPE: Dual Object Perception-Enhancement Network for Vision-and-Language Navigation

Abstract:Vision-and-Language Navigation (VLN) is a challenging task where an agent must understand language instructions and navigate unfamiliar environments using visual cues. The agent must accurately locate the target based on visual information from the environment and complete tasks through interaction with the surroundings. Despite significant advancements in this field, two major limitations persist: (1) Many existing methods input complete language instructions directly into multi-layer Transformer networks without fully exploiting the detailed information within the instructions, thereby limiting the agent's language understanding capabilities during task execution; (2) Current approaches often overlook the modeling of object relationships across different modalities, failing to effectively utilize latent clues between objects, which affects the accuracy and robustness of navigation decisions. We propose a Dual Object Perception-Enhancement Network (DOPE) to address these issues to improve navigation performance. First, we design a Text Semantic Extraction (TSE) to extract relatively essential phrases from the text and input them into the Text Object Perception-Augmentation (TOPA) to fully leverage details such as objects and actions within the instructions. Second, we introduce an Image Object Perception-Augmentation (IOPA), which performs additional modeling of object information across different modalities, enabling the model to more effectively utilize latent clues between objects in images and text, enhancing decision-making accuracy. Extensive experiments on the R2R and REVERIE datasets validate the efficacy of the proposed approach.

* Main paper (10 pages). Accepted for publication by ICMR(International Conference on Multimedia Retrieval) 2025

Via

Access Paper or Ask Questions

Learning Based MPC for Autonomous Driving Using a Low Dimensional Residual Model

Dec 05, 2024

Yaoyu Li, Chaosheng Huang, Dongsheng Yang, Wenbo Liu, Jun Li

Figure 1 for Learning Based MPC for Autonomous Driving Using a Low Dimensional Residual Model

Figure 2 for Learning Based MPC for Autonomous Driving Using a Low Dimensional Residual Model

Figure 3 for Learning Based MPC for Autonomous Driving Using a Low Dimensional Residual Model

Figure 4 for Learning Based MPC for Autonomous Driving Using a Low Dimensional Residual Model

Abstract:In this paper, a learning based Model Predictive Control (MPC) using a low dimensional residual model is proposed for autonomous driving. One of the critical challenge in autonomous driving is the complexity of vehicle dynamics, which impedes the formulation of accurate vehicle model. Inaccurate vehicle model can significantly impact the performance of MPC controller. To address this issue, this paper decomposes the nominal vehicle model into invariable and variable elements. The accuracy of invariable component is ensured by calibration, while the deviations in the variable elements are learned by a low-dimensional residual model. The features of residual model are selected as the physical variables most correlated with nominal model errors. Physical constraints among these features are formulated to explicitly define the valid region within the feature space. The formulated model and constraints are incorporated into the MPC framework and validated through both simulation and real vehicle experiments. The results indicate that the proposed method significantly enhances the model accuracy and controller performance.

* 7 pages, 11 figures, 4 tables. Submitted to IEEE ICRA 2025

Via

Access Paper or Ask Questions

Comprehensive Solution Program Centric Pretraining for Table-and-Text Hybrid Numerical Reasoning

May 12, 2023

Qianying Liu, Dongsheng Yang, Wenjie Zhong, Fei Cheng, Sadao Kurohashi

Figure 1 for Comprehensive Solution Program Centric Pretraining for Table-and-Text Hybrid Numerical Reasoning

Figure 2 for Comprehensive Solution Program Centric Pretraining for Table-and-Text Hybrid Numerical Reasoning

Figure 3 for Comprehensive Solution Program Centric Pretraining for Table-and-Text Hybrid Numerical Reasoning

Figure 4 for Comprehensive Solution Program Centric Pretraining for Table-and-Text Hybrid Numerical Reasoning

Abstract:Numerical reasoning over table-and-text hybrid passages, such as financial reports, poses significant challenges and has numerous potential applications. Noise and irrelevant variables in the model input have been a hindrance to its performance. Additionally, coarse-grained supervision of the whole solution program has impeded the model's ability to learn the underlying numerical reasoning process. In this paper, we propose three pretraining tasks that operate at both the whole program and sub-program level: Variable Integrity Ranking, which guides the model to focus on useful variables; Variable Operator Prediction, which decomposes the supervision into fine-grained single operator prediction; and Variable Keyphrase Masking, which encourages the model to identify key evidence that sub-programs are derived from. Experimental results demonstrate the effectiveness of our proposed methods, surpassing transformer-based model baselines.

* 11 pages

Via

Access Paper or Ask Questions

Motion Planning for Autonomous Driving: The State of the Art and Future Perspectives

Mar 29, 2023

Siyu Teng, Xuemin Hu, Peng Deng, Bai Li, Yuchen Li, Dongsheng Yang, Yunfeng Ai, Lingxi Li, Long Chen, Zhe Xuanyuan(+1 more)

Figure 1 for Motion Planning for Autonomous Driving: The State of the Art and Future Perspectives

Figure 2 for Motion Planning for Autonomous Driving: The State of the Art and Future Perspectives

Figure 3 for Motion Planning for Autonomous Driving: The State of the Art and Future Perspectives

Figure 4 for Motion Planning for Autonomous Driving: The State of the Art and Future Perspectives

Abstract:Thanks to the augmented convenience, safety advantages, and potential commercial value, Intelligent vehicles (IVs) have attracted wide attention throughout the world. Although a few autonomous driving unicorns assert that IVs will be commercially deployable by 2025, their implementation is still restricted to small-scale validation due to various issues, among which precise computation of control commands or trajectories by planning methods remains a prerequisite for IVs. This paper aims to review state-of-the-art planning methods, including pipeline planning and end-to-end planning methods. In terms of pipeline methods, a survey of selecting algorithms is provided along with a discussion of the expansion and optimization mechanisms, whereas in end-to-end methods, the training approaches and verification scenarios of driving tasks are points of concern. Experimental platforms are reviewed to facilitate readers in selecting suitable training and validation methods. Finally, the current challenges and future directions are discussed. The side-by-side comparison presented in this survey not only helps to gain insights into the strengths and limitations of the reviewed methods but also assists with system-level design choices.

* 20 pages, 14 figures and 5 tables

Via

Access Paper or Ask Questions

Optimizing Facial Expressions of an Android Robot Effectively: a Bayesian Optimization Approach

Jan 13, 2023

Dongsheng Yang, Wataru Sato, Qianying Liu, Takashi Minato, Shushi Namba, Shin'ya Nishida

Abstract:Expressing various facial emotions is an important social ability for efficient communication between humans. A key challenge in human-robot interaction research is providing androids with the ability to make various human-like facial expressions for efficient communication with humans. The android Nikola, we have developed, is equipped with many actuators for facial muscle control. While this enables Nikola to simulate various human expressions, it also complicates identification of the optimal parameters for producing desired expressions. Here, we propose a novel method that automatically optimizes the facial expressions of our android. We use a machine vision algorithm to evaluate the magnitudes of seven basic emotions, and employ the Bayesian Optimization algorithm to identify the parameters that produce the most convincing facial expressions. Evaluations by naive human participants demonstrate that our method improves the rated strength of the android's facial expressions of anger, disgust, sadness, and surprise compared with the previous method that relied on Ekman's theory and parameter adjustments by a human expert.

* 8 pages, 8 figures, accepted by Humanoids2022

Via

Access Paper or Ask Questions