Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bin Yao

A Hierarchical Test Platform for Vision Language Model (VLM)-Integrated Real-World Autonomous Driving

Jun 17, 2025

Yupeng Zhou, Can Cui, Juntong Peng, Zichong Yang, Juanwu Lu, Jitesh H Panchal, Bin Yao, Ziran Wang

Abstract:Vision-Language Models (VLMs) have demonstrated notable promise in autonomous driving by offering the potential for multimodal reasoning through pretraining on extensive image-text pairs. However, adapting these models from broad web-scale data to the safety-critical context of driving presents a significant challenge, commonly referred to as domain shift. Existing simulation-based and dataset-driven evaluation methods, although valuable, often fail to capture the full complexity of real-world scenarios and cannot easily accommodate repeatable closed-loop testing with flexible scenario manipulation. In this paper, we introduce a hierarchical real-world test platform specifically designed to evaluate VLM-integrated autonomous driving systems. Our approach includes a modular, low-latency on-vehicle middleware that allows seamless incorporation of various VLMs, a clearly separated perception-planning-control architecture that can accommodate both VLM-based and conventional modules, and a configurable suite of real-world testing scenarios on a closed track that facilitates controlled yet authentic evaluations. We demonstrate the effectiveness of the proposed platform`s testing and evaluation ability with a case study involving a VLM-enabled autonomous vehicle, highlighting how our test framework supports robust experimentation under diverse conditions.

Via

Access Paper or Ask Questions

EDBench: Large-Scale Electron Density Data for Molecular Modeling

May 14, 2025

Hongxin Xiang, Ke Li, Mingquan Liu, Zhixiang Cheng, Bin Yao, Wenjie Du, Jun Xia, Li Zeng, Xin Jin, Xiangxiang Zeng

Abstract:Existing molecular machine learning force fields (MLFFs) generally focus on the learning of atoms, molecules, and simple quantum chemical properties (such as energy and force), but ignore the importance of electron density (ED) $\rho(r)$ in accurately understanding molecular force fields (MFFs). ED describes the probability of finding electrons at specific locations around atoms or molecules, which uniquely determines all ground state properties (such as energy, molecular structure, etc.) of interactive multi-particle systems according to the Hohenberg-Kohn theorem. However, the calculation of ED relies on the time-consuming first-principles density functional theory (DFT) which leads to the lack of large-scale ED data and limits its application in MLFFs. In this paper, we introduce EDBench, a large-scale, high-quality dataset of ED designed to advance learning-based research at the electronic scale. Built upon the PCQM4Mv2, EDBench provides accurate ED data, covering 3.3 million molecules. To comprehensively evaluate the ability of models to understand and utilize electronic information, we design a suite of ED-centric benchmark tasks spanning prediction, retrieval, and generation. Our evaluation on several state-of-the-art methods demonstrates that learning from EDBench is not only feasible but also achieves high accuracy. Moreover, we show that learning-based method can efficiently calculate ED with comparable precision while significantly reducing the computational cost relative to traditional DFT calculations. All data and benchmarks from EDBench will be freely available, laying a robust foundation for ED-driven drug discovery and materials science.

Via

Access Paper or Ask Questions

Enhancing Heterogeneous Knowledge Graph Completion with a Novel GAT-based Approach

Aug 05, 2024

Wanxu Wei, Yitong Song, Bin Yao

Abstract:Knowledge graphs (KGs) play a vital role in enhancing search results and recommendation systems. With the rapid increase in the size of the KGs, they are becoming inaccuracy and incomplete. This problem can be solved by the knowledge graph completion methods, of which graph attention network (GAT)-based methods stand out since their superior performance. However, existing GAT-based knowledge graph completion methods often suffer from overfitting issues when dealing with heterogeneous knowledge graphs, primarily due to the unbalanced number of samples. Additionally, these methods demonstrate poor performance in predicting the tail (head) entity that shares the same relation and head (tail) entity with others. To solve these problems, we propose GATH, a novel GAT-based method designed for Heterogeneous KGs. GATH incorporates two separate attention network modules that work synergistically to predict the missing entities. We also introduce novel encoding and feature transformation approaches, enabling the robust performance of GATH in scenarios with imbalanced samples. Comprehensive experiments are conducted to evaluate the GATH's performance. Compared with the existing SOTA GAT-based model on Hits@10 and MRR metrics, our model improves performance by 5.2% and 5.2% on the FB15K-237 dataset, and by 4.5% and 14.6% on the WN18RR dataset, respectively.

* ACM Transactions on Knowledge Discovery from Data, Volume 18, Issue 4, 2024

Via

Access Paper or Ask Questions

CS3: Cascade SAM for Sperm Segmentation

Jul 04, 2024

Yi Shi, Xu-Peng Tian, Yun-Kai Wang, Tie-Yi Zhang, Bin Yao, Hui Wang, Yong Shao, Cen-Cen Wang, Rong Zeng, De-Chuan Zhan

Abstract:Automated sperm morphology analysis plays a crucial role in the assessment of male fertility, yet its efficacy is often compromised by the challenges in accurately segmenting sperm images. Existing segmentation techniques, including the Segment Anything Model(SAM), are notably inadequate in addressing the complex issue of sperm overlap-a frequent occurrence in clinical samples. Our exploratory studies reveal that modifying image characteristics by removing sperm heads and easily segmentable areas, alongside enhancing the visibility of overlapping regions, markedly enhances SAM's efficiency in segmenting intricate sperm structures. Motivated by these findings, we present the Cascade SAM for Sperm Segmentation (CS3), an unsupervised approach specifically designed to tackle the issue of sperm overlap. This method employs a cascade application of SAM to segment sperm heads, simple tails, and complex tails in stages. Subsequently, these segmented masks are meticulously matched and joined to construct complete sperm masks. In collaboration with leading medical institutions, we have compiled a dataset comprising approximately 2,000 unlabeled sperm images to fine-tune our method, and secured expert annotations for an additional 240 images to facilitate comprehensive model assessment. Experimental results demonstrate superior performance of CS3 compared to existing methods.

Via

Access Paper or Ask Questions

Global-Position Tracking Control of 3-D Bipedal Walking via Virtual Constraint Design and Multiple Lyapunov Analysis

Aug 08, 2021

Yan Gu, Yuan Gao, Bin Yao, C. S. George Lee

Figure 1 for Global-Position Tracking Control of 3-D Bipedal Walking via Virtual Constraint Design and Multiple Lyapunov Analysis

Figure 2 for Global-Position Tracking Control of 3-D Bipedal Walking via Virtual Constraint Design and Multiple Lyapunov Analysis

Figure 3 for Global-Position Tracking Control of 3-D Bipedal Walking via Virtual Constraint Design and Multiple Lyapunov Analysis

Figure 4 for Global-Position Tracking Control of 3-D Bipedal Walking via Virtual Constraint Design and Multiple Lyapunov Analysis

Abstract:A safety-critical measure of legged locomotion performance is a robot's ability to track its desired time-varying position trajectory in an environment, which is herein termed as "global-position tracking". This paper introduces a nonlinear control approach that achieves asymptotic global-position tracking for three-dimensional (3-D) bipedal robot walking. Designing a global-position tracking controller presents a challenging problem due to the complex hybrid robot model and the time-varying desired global-position trajectory. Towards tackling this problem, the first main contribution is the construction of impact invariance to ensure all desired trajectories respect the foot-landing impact dynamics, which is a necessary condition for realizing asymptotic tracking of hybrid walking systems. Thanks to their independence of the desired global position, these conditions can be exploited to decouple the higher-level planning of the global position and the lower-level planning of the remaining trajectories, thereby greatly alleviating the computational burden of motion planning. The second main contribution is the Lyapunov-based stability analysis of the hybrid closed-loop system, which produces sufficient conditions to guide the controller design for achieving asymptotic global-position tracking during fully actuated walking. Simulations and experiments on a 3-D bipedal robot with twenty revolute joints confirm the validity of the proposed control approach in guaranteeing accurate tracking.

* 16 pages

Via

Access Paper or Ask Questions

Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration

Feb 18, 2020

Cong Guo, Yangjie Zhou, Jingwen Leng, Yuhao Zhu, Zidong Du, Quan Chen, Chao Li, Minyi Guo, Bin Yao

Figure 1 for Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration

Figure 2 for Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration

Figure 3 for Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration

Figure 4 for Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration

Abstract:The research interest in specialized hardware accelerators for deep neural networks (DNN) spiked recently owing to their superior performance and efficiency. However, today's DNN accelerators primarily focus on accelerating specific "kernels" such as convolution and matrix multiplication, which are vital but only part of an end-to-end DNN-enabled application. Meaningful speedups over the entire application often require supporting computations that are, while massively parallel, ill-suited to DNN accelerators. Integrating a general-purpose processor such as a CPU or a GPU incurs significant data movement overhead and leads to resource under-utilization on the DNN accelerators. We propose Simultaneous Multi-mode Architecture (SMA), a novel architecture design and execution model that offers general-purpose programmability on DNN accelerators in order to accelerate end-to-end applications. The key to SMA is the temporal integration of the systolic execution model with the GPU-like SIMD execution model. The SMA exploits the common components shared between the systolic-array accelerator and the GPU, and provides lightweight reconfiguration capability to switch between the two modes in-situ. The SMA achieves up to 63% performance improvement while consuming 23% less energy than the baseline Volta architecture with TensorCore.

* Accepted by DAC2020

Via

Access Paper or Ask Questions